Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert factor to date object R without NA

Question: how can I convert a factor to a date object without getting NA values.

Here's a similar post: Convert Factor to Date/Time in R

In that post, the user converted to a character object before a date. I am getting NA values when converting to character object using as.character inside the as.Date function.

I have a column in the dataframe with the date in factor format with different numbers of occurrences. Here's the information contained in the data.frame.

> head(fraud, 5)
  TRANSACTION.DATE TRANSACTION.AMOUNT AIR.TRAVEL.DATE POSTING.DATE
1 2/27/14                  25.00                 <NA>          2/28/14
2 2/28/14                  25.00                 <NA>          2/28/14
3 2/27/14                  25.00                 <NA>          2/28/14
4 2/27/14                  20.00              2/27/14          2/28/14
5 2/27/14                  12.13                 <NA>          2/28/14

> str(fraud$TRANSACTION.DATE)
 Factor w/ 519 levels "1/1/14","1/1/15",..: 228 230 228 228 228 230 226 228 230 228 ...

> summary(fraud$TRANSACTION.DATE, 5)
9/30/14 9/17/14 11/4/14 9/23/14 (Other) 
    197     187     171     160   19221 

Converting the factor to a date object resulted in NA values.

> fraud$TRANSACTION.DATE <- as.Date(as.character(fraud$TRANSACTION.DATE), 
+                                       format = "%m/%d/%Y")
> head(fraud$TRANSACTION.DATE, 5)
[1] NA NA NA NA NA

Checking if the as.character function worked.

> fraud$TRANSACTION.DATE <- as.character(fraud$TRANSACTION.DATE)
> head(fraud$TRANSACTION.DATE)
[1] NA NA NA NA NA NA

EDIT: I used as.Date function but got the wrong formatting

> fraud$TRANSACTION.DATE <- as.Date(fraud$TRANSACTION.DATE, format = "%m/%d/%Y")
> str(fraud$TRANSACTION.DATE)
 Date[1:19936], format: "0014-02-27" "0014-02-28" "0014-02-27" "0014-02-27" "0014-02-27" ...
> head(fraud$TRANSACTION.DATE, 5)
[1] "0014-02-27" "0014-02-28" "0014-02-27" "0014-02-27" "0014-02-27"

EDIT 2: Here's the dput value

> dput(droplevels(head(fraud$TRANSACTION.DATE)))
structure(c(1L, 2L, 1L, 1L, 1L, 2L), .Label = c("2/27/14", "2/28/14"
), class = "factor")

Solution: using %y instead of %Y

> fraud$TRANSACTION.DATE <- as.Date(fraud$TRANSACTION.DATE, "%m/%d/%y")
> head(fraud$TRANSACTION.DATE, 5)
[1] "2014-02-27" "2014-02-28" "2014-02-27" "2014-02-27" "2014-02-27"
like image 403
Scott Davis Avatar asked Dec 20 '25 20:12

Scott Davis


1 Answers

The problem now is that your format string states the dates include the year with century where your dates only contain the year without century. You need to use the %y placeholder, not the %Y one.

dates <- factor(c("2/27/14","2/28/14","2/27/14","2/27/14","2/27/14"))
as.Date(dates, format = "%m/%d/%y") # correct lowercase y
as.Date(dates, format = "%m/%d/%Y") # incorrect uppercase y

> as.Date(dates, format = "%m/%d/%y")
[1] "2014-02-27" "2014-02-28" "2014-02-27" "2014-02-27" "2014-02-27"
> as.Date(dates, format = "%m/%d/%Y")
[1] "14-02-27" "14-02-28" "14-02-27" "14-02-27" "14-02-27"

Notice R gets it right when you use the correct placeholder; lowercase y.

What happens with %Y when you don't have a year with century seems OS dependent. As you can see on Linux (Fedora 22) I get no padding of the year part whereas you are seeing zero-padding.

like image 92
Gavin Simpson Avatar answered Dec 23 '25 10:12

Gavin Simpson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!