I have a 1gb csv file with Dates and according values. Now is the Dates are in "undefined Format" - so they are diplayed as numbers in Excel like this:
DATE FXVol.DKK.EUR,0.75,4
38719 0.21825
I cannot open the csv file and change it to the date format I like since I would lose data in this way.
If I now import the data to R and convert the Dates:
as.Date( workingfilereturns[,1], format = "%Y-%m-%d")
It always yields dates that are 70 years + so 2076 instead of 2006. I really have no idea what goes wrong or how to fix this issue.
(Note: I have added a note about some quirks in R when dealing with Excel data. You may want to skip directly to that at the bottom; what follows first is the original answer.)
Going by your sample data, 38719 appears to be the number of days which have elapsed since January 1, 1900. So you can just add this number of days to January 1, 1900 to arrive at the correct Date object which you want:
as.Date("1900-01-01") + workingfilereturns[,1]
or
as.Date("1900-01-01") + workingfilereturns$DATE
Example:
> as.Date("1900-01-01") + 38719
[1] "2006-01-04"
Update:
As @Roland correctly pointed out, you could also use as.Date.numeric while specifying an origin of January 1, 1900:
> as.Date.numeric(38719, origin="1900-01-01")
[1] "2006-01-04"
Bug warning:
As the asker @Methamortix pointed out, my solution, namely using January 1, 1900, as the origin, yields a date which is two days too late in R. There are two reasons for this:
as.Date.numeric(0, origin="1900-01-01") is January 1, 1900, in R, but Excel starts counting at 1, meaning that formatting the number 1 in Excel as a Date yields January 1, 1900. This explains why R is one day ahead of Excel.
As evidence of this, consider the following code:
> as.Date.numeric(57, origin="1900-01-01")
[1] "1900-02-27"
> as.Date.numeric(58, origin="1900-01-01")
[1] "1900-02-28"
> as.Date.numeric(59, origin="1900-01-01")
[1] "1900-03-01"
In other words, R's as.Date() correctly skipped over February 29th. But type the number 60 into a cell in Excel, format as date, and it will come back as February 29, 1900. My guess is that this has been reported somewhere, possibly on Stack Overflow or elsewhere, but let this serve as another reference point.
So, going back to the original question, the origin needs to be offset by 2 days when dealing with Excel dates in R, where the date is greater than February 28, 1900 (which is the case of the original problem). So he should use his date data frame in the following way:
as.Date.numeric(workingfilereturns$DATE - 2, origin="1900-01-01")
where the date column has been rolled back by two days to sync up with the values in Excel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With