I have notice that the following two commands yield very different results, so I was wondering what the difference is ?
TestData <- TestData[, keyby = c("group","date"),
.(totalCount = sum(count))]
TestData <- TestData[, by = c("group","date"),
.(totalCount = sum(count))]
according to the cheatsheet:
dt[, j, by = .(a)] – group rows by values in specified columns.
and
dt[, j, keyby = .(a)] – group and simultaneously sort rows by values in specified columns.
Using keyby
rather than by
will make result rows ordered by the columns you are grouping on.
Otherwise when using by
the row order of results is retained as the order of the groups in input data.
Having ordered data can speed up some further computations on that. On the other hand having original order might be required by user. In most cases keyby
will be slightly faster than by
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With