a lot of people seem to have this issue however I was not able to find a satisfying answer. If you indulge me, I would like to be sure to understand what's happening
I'm having dates of various format in a dataframe (also a common issue) so i have built a small function to handle it for me:
dateHandler <- function(inputString){
if(grepl("-",inputString)==T){
lubridate::dmy(inputString, tz="GMT")
}else{
as.POSIXct(as.numeric(inputString)*60*60*24, origin="1899-12-30", tz="GMT")
}
}
When using it on one element it works fine:
myExample <-c("18-Mar-11","42433")
> dateHandler(myExample[1])
[1] "2011-03-18 GMT"
> dateHandler(myExample[2])
[1] "2016-03-04 GMT"
However when using it on a whole column it does not work:
myDf <- as.data.frame(myExample)
> myDf <- myDf %>%
+ dplyr::mutate(dateClean=dateHandler(myExample))
Warning messages:
1: In if (grepl("-", inputString) == T) { :
the condition has length > 1 and only the first element will be used
2: 1 failed to parse.
From reading on the forum, my current understanding is that R passes a vector with all the elements of myDf$myExample to the function, which is not built to handle vector of length >1. If that is correct, the next step is to understand what to do from there. Many people recommend using ifelse rather than if but I do not understand how this would help me. Also I read that ifelse returns something of the same format as its input, which does not work for me in that case.
Thank you in advance for answering this question for the 10000th time.
Nicolas
You have two option on where to go from there. One is to apply your current function to a list using lapply. As in:
myDf$dateClean <- lapply(myDf$myExample, function(x) dateHandler(x))
The other option is to build a vectorized function that is designed to take a vector as an input rather than a single data point. Here is a simple example:
dateHandlerVectorized <- function(inputVector){
output <- rep(as.POSIXct("1/1/11"), length(inputVector))
UseLuridate <- grepl("-", inputVector)
output[UseLuridate] <- lubridate::dmy(inputVector[UseLuridate], tz="GMT")
output[!UseLuridate] <- as.POSIXct(as.numeric(inputVector[!UseLuridate])*60*60*24, origin="1899-12-30", tz="GMT")
output
}
myDf <- myDf %>% dplyr::mutate(dateClean=dateHandlerVectorized(myDf$myExample))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With