Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table - summarizing data - difference between `by` and `keyby`?

Tags:

r

data.table

I have notice that the following two commands yield very different results, so I was wondering what the difference is ?

TestData <- TestData[, keyby = c("group","date"), 
                             .(totalCount = sum(count))]

TestData <- TestData[, by = c("group","date"), 
                             .(totalCount = sum(count))]

according to the cheatsheet:

dt[, j, by = .(a)] – group rows by values in specified columns.

and

dt[, j, keyby = .(a)] – group and simultaneously sort rows by values in specified columns.

like image 322
Nneka Avatar asked Oct 21 '25 15:10

Nneka


1 Answers

Using keyby rather than by will make result rows ordered by the columns you are grouping on. Otherwise when using by the row order of results is retained as the order of the groups in input data. Having ordered data can speed up some further computations on that. On the other hand having original order might be required by user. In most cases keyby will be slightly faster than by.

like image 89
jangorecki Avatar answered Oct 23 '25 04:10

jangorecki



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!