integer64 class in r data.table, sum() and by=.()

Question

I just noticed this issue with a column in a data.table that turned out to be of the integer64 class. I was reading the data using fread from a location on the internet and was not aware that the column in question was being interpreted as integer64, a class I am not familiar with. The issue is how this class behaves in a data.table when using sum() and by. It has been referenced similarly in two other questions on here, but that was in the context of using it as an ID value (Q1 and Q2)

When performing a sum() by group on this integer64 column, it does not behave as expected (as a numeric) when there are negative values in the column. Why is this? Is it a bug?

library(data.table); library(bit64)

z <- data.table(
  group = c("A","A","A"),
  int64 = as.integer64(c(10,20,-10)),
  numeric = c(10,20,-10)
)

To start, it works fine without the by statement:

z[, sum(int64)]  #20
z[, sum(int64, na.rm=T)] #20

And in non-data.table format

sum(z$int64)
sum(z$int64, na.rm = TRUE)

But when including the by statement, it gets fishy:

    z[, sum(int64, na.rm=FALSE), by=group] #only the negative value
    #group  V1
    #A     -10

    z[, sum(int64, na.rm=TRUE), by=group] #excluding the negative value
    #group  V1
    #A      30

    z[, sum(as.numeric(int64)), by=group] #expected answer
    #group  V1
    #A      20

This is worrying to me as on the surface level there is no reason to believe anything is wrong with the numbers in z$int64 and I only noticed as there were very few rows.

Waldi · Accepted Answer

This has now been corrected, see https://github.com/Rdatatable/data.table/issues/1647

z[, sum(int64, na.rm=FALSE), by=group]
#    group    V1
#   <char> <i64>
#1:      A    20

z[, sum(int64, na.rm=TRUE), by=group]
#    group    V1
#   <char> <i64>
#1:      A    20

z[, sum(as.numeric(int64)), by=group]
#    group    V1
#   <char> <num>
#1:      A    20

integer64 class in r data.table, sum() and by=.()

Tags:

r

data.table

moman822

1 Answers

Waldi

Recent Activity

Donate For Us

integer64 class in r data.table, sum() and by=.()

Tags:

r

data.table

moman822

1 Answers

Waldi

Related questions

Recent Activity

Donate For Us