Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

as.POSIXct gives different results from vector and single value arguments

Tags:

r

posixct

I have run into some unexpected behaviour in some fairly old code runnning in the latest release of R.  The underlying issue seems to be that as.POSIXct does not correctly parse date/time strings in vectors under some circumstances, in this case the presence of a POSIXct value at midnight in an input vector seems to corrupt the output of as.POSIXct

To avoid possible issues with formatting, the example below checks the numeric value of the POSIXct variables not the string values, both being rendering of the underlying datatype.  Am I missing something here? Is there some reason for this behaviour like perhaps as.POSIXct not supporting vectors?

Comments below are the output from running the script edited into the script. d1 is the value at midnight, d2 and d3 are different times during that same day.

d1 = as.POSIXct("2025-07-15 00:00:00")
d2 = as.POSIXct("2025-07-15 15:00:00")
d3 = as.POSIXct("2025-07-15 19:00:00")
 
str(d1)
# POSIXct[1:1], format: "2025-07-15"
str(d2)
# POSIXct[1:1], format: "2025-07-15 15:00:00"
str(d3)
# POSIXct[1:1], format: "2025-07-15 19:00:00"
 
numericd1 = as.numeric(d1)
numericd2 = as.numeric(d2)
numericd3 = as.numeric(d3)
 
print(paste(numericd1, numericd2, numericd3))
# [1] "1752501600 1752555600 1752570000"

print("### Mixed Values including midnight result in unexpected values")
# [1] "### Mixed Values including midnight result in unexpected values"
mixedValues = as.numeric(as.POSIXct(c(as.character(d2), as.character(d1), as.character(d3))))
print (mixedValues)
# [1] 1752501600 1752501600 1752501600
if(mixedValues[1]!= numericd2)
   print(paste("mismatch", mixedValues[1], numericd2))
# [1] "mismatch 1752501600 1752555600"
if(mixedValues[2]!= numericd1)
  print(paste("mismatch", mixedValues[2], numericd1))
if(mixedValues[3]!= numericd3)
  print(paste("mismatch", mixedValues[3], numericd3))
# [1] "mismatch 1752501600 1752570000"

print("### Changing the order of the mixed Values does not change the behaviour")
# [1] "### Changing the order of the mixed Values does not change the behaviour"
mixedValues = as.numeric(as.POSIXct(c(as.character(d2), as.character(d3), as.character(d1))))
print (mixedValues)
#[1] 1752501600 1752501600 1752501600
if(mixedValues[1]!= numericd2)
  print(paste("mismatch", mixedValues[1], numericd2))
# [1] "mismatch 1752501600 1752555600"
if(mixedValues[2]!= numericd1)
  print(paste("mismatch", mixedValues[2], numericd3))
if(mixedValues[3]!= numericd3)
  print(paste("mismatch", mixedValues[3], numericd1))
# [1] "mismatch 1752501600 1752501600"

print("### A list of all the same values however converts d2 correctly")
# [1] "### A list of all the same values however converts d2 correctly"
mixedValues = as.numeric(as.POSIXct(c(as.character(d2), as.character(d2), as.character(d2))))
 
print (mixedValues)
# [1] 1752555600 1752555600 1752555600
if(mixedValues[1]!= numericd2)
  print(paste("mismatch", mixedValues[1], numericd2))
if(mixedValues[2]!= numericd2)
  print(paste("mismatch", mixedValues[2], numericd1))
if(mixedValues[3]!= numericd2)
  print(paste("mismatch", mixedValues[3], numericd3))

print("### A list with a single value produces a correct result")
# [1] "### A list with a single value produces a correct result"

mixedValues = as.numeric(as.POSIXct(c(as.character(d2))))
if(mixedValues != numericd2)
  print(paste("mismatch", mixedValues, numericd2))

print("### Accessing the variable directly produces a correct result")
# [1] "### Accessing the variable directly produces a correct result"

mixedValues = as.numeric(as.POSIXct(as.character(d2)))
if(mixedValues != numericd2)
  print(paste("mismatch", mixedValues, numericd2))
 
print("converting POSIXct to POSIXct directly does not change the values")
# [1] "converting POSIXct to POSIXct directly does not change the values"
  
mixedValues = as.numeric(as.POSIXct(c(d2, d1, d3)))
if(mixedValues[1]!= numericd2)
  print(paste("mismatch", mixedValues[1], numericd2))
if(mixedValues[2]!= numericd1)
  print(paste("mismatch", mixedValues[2], numericd1))
if(mixedValues[3]!= numericd3)
  print(paste("mismatch", mixedValues[3], numericd3))

print("### A list witiout the midnight value however converts d2 correctly")
# [1] "### A list witiout the midnight value however converts d2 correctly"
mixedValues = as.numeric(as.POSIXct(c(as.character(d3), as.character(d2), as.character(d3))))
print (mixedValues)
# [1] 1752570000 1752555600 1752570000
if(mixedValues[1]!= numericd3)
  print(paste("mismatch", mixedValues[1], numericd3))
if(mixedValues[2]!= numericd2)
  print(paste("mismatch", mixedValues[2], numericd2))
if(mixedValues[3]!= numericd3)
  print(paste("mismatch", mixedValues[3], numericd3))

print("### A list with a single value produces a correct result")

mixedValues = as.numeric(as.POSIXct(c(as.character(d2))))
if(mixedValues != numericd2)
  print(paste("mismatch", mixedValues, numericd2))

print("### Accessing the variable directly produces a correct result")

mixedValues = as.numeric(as.POSIXct(as.character(d2)))
if(mixedValues != numericd2)
  print(paste("mismatch", mixedValues, numericd2))

print("converting POSIXct to POSIXct directly does not change the values")
#  
mixedValues = as.numeric(as.POSIXct(c(d2, d1, d3)))
if(mixedValues[1]!= numericd2)
  print(paste("mismatch", mixedValues[1], numericd2))
if(mixedValues[2]!= numericd1)
  print(paste("mismatch", mixedValues[2], numericd1))
if(mixedValues[3]!= numericd3)
  print(paste("mismatch", mixedValues[3], numericd3))


like image 538
Greg Hunt Avatar asked Oct 15 '25 03:10

Greg Hunt


2 Answers

BLUF: your attempt to "compare numeric values" of parsed values falls prey to the below phenomenon, by the time you get to the as.numeric the damage is done. You cannot rely on round-trip POSIXt-character-"anything" equality.


The issue is that beneath it all, as.POSIXlt.character (yes, lt, even though you're using ct) is using its tryFormats for various candidate formats. The relevant portion of the function:

function (x, tz = "", format, tryFormats = c("%Y-%m-%d %H:%M:%OS", 
    "%Y/%m/%d %H:%M:%OS", "%Y-%m-%d %H:%M", "%Y/%m/%d %H:%M", 
    "%Y-%m-%d", "%Y/%m/%d"), optional = FALSE, ...) 
{
    x <- unclass(x)
    if (!missing(format)) {
      # ... not relevant here
    }
    xx <- x[!is.na(x)]
    if (!length(xx)) {
      # ... not relevant here
    } else for (f in tryFormats) if (all(!is.na(strptime(xx, f, tz = tz)))) {
        res <- strptime(x, f, tz = tz)
        if (nzchar(tz)) attr(res, "tzone") <- tz
        return(res)
    }
    # ...
}

(Side observation: really? calling strptime twice is okay here? I may submit a PR to remove the double-computation there.)

The key takeaway is the use of all(!is.na(strptime(xx, f, tz=tz))).

If you look at your as.character(d#) variables, one of them is just the date component (due to R's default for midnight-rendering), so the ideal "%Y-%m-%d %H:%M:%S" will not work.

as.character(d1)
# [1] "2025-07-15"
strptime(as.character(d1), format="%Y-%m-%d %H:%M:%OS")
# [1] NA

Because the function uses all(!is.na(.)), and the first does not parse, the preferred date+time format is discarded. This hold true for all of the other formats that include %H since as.character(d1) will not have it. We end up with "%Y-%m-%d", now realizing that strings that include time will pass this step (and the time will be ignored):

as.character(d2)
# [1] "2025-07-15 15:00:00"
strptime(as.character(d2), format="%Y-%m-%d")
# [1] "2025-07-15 EDT"

For the curious, we can see how all three as.character(d#) variables do against all of the internal tryFormats (extracted from as.POSIXlt.character above):

outer(
  setNames(nm=c(as.character(d1), as.character(d2), as.character(d3))), 
  setNames(nm=c("%Y-%m-%d %H:%M:%OS", "%Y/%m/%d %H:%M:%OS", "%Y-%m-%d %H:%M", "%Y/%m/%d %H:%M", "%Y-%m-%d", "%Y/%m/%d")),
  function(tm, fmt) Map(strptime, tm, fmt)
)
#                     %Y-%m-%d %H:%M:%OS  %Y/%m/%d %H:%M:%OS %Y-%m-%d %H:%M      %Y/%m/%d %H:%M %Y-%m-%d   %Y/%m/%d
# 2025-07-15          NA                  NA                 NA                  NA             2025-07-15 NA      
# 2025-07-15 15:00:00 2025-07-15 15:00:00 NA                 2025-07-15 15:00:00 NA             2025-07-15 NA      
# 2025-07-15 19:00:00 2025-07-15 19:00:00 NA                 2025-07-15 19:00:00 NA             2025-07-15 NA      

Because of the all(!is.na(.)) component, we need to find a column where there are no NA values, which is associated with the date-only %Y-%m-%d(5th) column.

You can see more obvious results of this all(!is.na(.)) behavior if you pass a clearly-wrong string:

as.POSIXct(c("quux", as.character(d1), as.character(d2), as.character(d3)))
# Error in as.POSIXlt.character(x, tz, ...) : 
#   character string is not in a standard unambiguous format

### applying our "outer" view from above, all columns have at least one NA
outer(
  setNames(nm=c("quux", as.character(d1), as.character(d2), as.character(d3))), 
  setNames(nm=c("%Y-%m-%d %H:%M:%OS", "%Y/%m/%d %H:%M:%OS", "%Y-%m-%d %H:%M", "%Y/%m/%d %H:%M", "%Y-%m-%d", "%Y/%m/%d")),
  function(tm, fmt) Map(strptime, tm, fmt)
)
#                     %Y-%m-%d %H:%M:%OS  %Y/%m/%d %H:%M:%OS %Y-%m-%d %H:%M      %Y/%m/%d %H:%M %Y-%m-%d   %Y/%m/%d
# quux                NA                  NA                 NA                  NA             NA         NA      
# 2025-07-15          NA                  NA                 NA                  NA             2025-07-15 NA      
# 2025-07-15 15:00:00 2025-07-15 15:00:00 NA                 2025-07-15 15:00:00 NA             2025-07-15 NA      
# 2025-07-15 19:00:00 2025-07-15 19:00:00 NA                 2025-07-15 19:00:00 NA             2025-07-15 NA      

Because none of the columns are free of NA values, we get an error.


Suggestion: never rely on R's rendering of POSIXt objects to be reversible, similar in notion (though not cause) to why you cannot assume that the rendering of pi is accurate and reversible through as.character.

like image 186
r2evans Avatar answered Oct 16 '25 18:10

r2evans


I think this has to do with how POSIXct at midnight like d1 will render as a date without the time component, but when POSIXct receives a vector with (apparent) mixed type, it coerces them to the common date type, thus losing all the time components.

One workaround could be to use as.character2 <- \(x) format(x, "%Y-%m-%d %H:%M:%S") and replace that where you currently have as.character.

> as.numeric(as.POSIXct(c(as.character(d1), as.character(d2), as.character(d3))))
[1] 1752562800 1752562800 1752562800
> as.numeric(as.POSIXct(c(as.character2(d1), as.character2(d2), as.character2(d3))))
[1] 1752562800 1752616800 1752631200
like image 40
Jon Spring Avatar answered Oct 16 '25 16:10

Jon Spring