data.table

Question

I have notice that the following two commands yield very different results, so I was wondering what the difference is ?

TestData <- TestData[, keyby = c("group","date"), 
                             .(totalCount = sum(count))]

TestData <- TestData[, by = c("group","date"), 
                             .(totalCount = sum(count))]

according to the cheatsheet:

dt[, j, by = .(a)] – group rows by values in specified columns.

and

dt[, j, keyby = .(a)] – group and simultaneously sort rows by values in specified columns.

jangorecki · Accepted Answer

Using keyby rather than by will make result rows ordered by the columns you are grouping on. Otherwise when using by the row order of results is retained as the order of the groups in input data. Having ordered data can speed up some further computations on that. On the other hand having original order might be required by user. In most cases keyby will be slightly faster than by.

data.table - summarizing data - difference between `by` and `keyby`?

Tags:

r

Nneka

1 Answers

jangorecki

Recent Activity

Donate For Us