Optimal way in data.table to make multiple columns from vectors of column name strings

Question

I am currently experimenting with data.table and looking for the 'optimal' way to to do things.

What I want to do in the following example is have a string with column names, append ".d" for normal deltas and append ".dP" for percentage deltas. (Bear in mind that the columns in the string are only a subset and not the full data.table even though my example is limited to these.)

I want the code to be as compact and fast as possible, using as much of the benefits of both R and data.table.

The solution that I have currently come up with is this:

percentDelta<-function(newvalue,basevalue){
  return((newvalue-basevalue)/basevalue)
}

normalDelta<-function(newvalue,basevalue){
  return(newvalue-basevalue)
}

DT = data.table(price=c(2,3,4,5,6,7,8), otherMetric=c(3,4,5,1,3,4,5))
deltaColsNames = c("otherMetric")
deltaColsNewNames <- paste0(deltaColsNames,'.d')
percentColsNewNames <- paste0(deltaColsNames,'.dP')

DT[,eval(deltaColsNewNames) := lapply(DT[,eval(deltaColsNames),with=F],normalDelta,price)]
DT[,eval(percentColsNewNames) := lapply(DT[,eval(deltaColsNames),with=F],percentDelta,price)]

I am not quite sure if the data.table calls to generate multiple columns is correct there? Is using "lapply" with "eval" the way one would go about this?

EDIT: Should I avoid the use of "with=F"?

DT[,eval(deltaColsNewNames) := lapply(DT[,which(names(DT) %in% deltaColsNames)],normalDelta,price)]
DT[,eval(percentColsNewNames) := lapply(DT[,which(names(DT) %in% deltaColsNames)],percentDelta,price)]

Arun · Accepted Answer

Issue #495 is solved now with this recent commit, we can now do this just fine:

require(data.table) # v1.9.7+
DT[, (deltaColsNewNames) := lapply(.SD, normalDelta, price), .SDcols=deltaColsNames]

Biebiep · Answer

So for answering the question and adding the optimisations from the comments, the following answers arose:

require(data.table) #version 1.9.5 from github needed!

normalDelta<-function(newvalue,basevalue){
  return(newvalue-basevalue)
}

DT = data.table(price=rep(c(3,4,5),each=200000000), otherMetric=sample(c(1,3,6),200000000,T))
deltaColsNames = c("otherMetric")
deltaColsNewNames <- paste0(deltaColsNames,'.d')

Scenario 1, using "eval" and "with=F":

system.time(DT[,(deltaColsNewNames) := lapply(DT[,eval(deltaColsNames),with=F],normalDelta,price)])
#   user  system elapsed 
#2.134   1.747   3.880

Scenario 2, using "which(names) %in%" to avoid strings as column indexes:

system.time(DT[,(deltaColsNewNames) := lapply(DT[,which(names(DT) %in% deltaColsNames)],normalDelta,price)])
#user  system elapsed 
#1.652   1.105   2.756

Scenario 3, using ".SD" syntax and eval() in 1.9.5 (in 1.9.4, this was slower):

system.time(DT[,(deltaColsNewNames) := lapply(.SD[, eval(deltaColsNames),with=F], normalDelta, price)])
 #user  system elapsed
#2.148   1.847   4.764

Scenario 4, using ".SD" syntax and which() in 1.9.5 (in 1.9.4, this was also slower):

system.time(DT[,(deltaColsNewNames) := lapply(.SD[, which(names(DT) %in% deltaColsNames)], normalDelta, price)])
#user  system elapsed 
#1.701   1.117   2.817

Scenario 5, using mget():

system.time(DT[, (deltaColsNewNames) := lapply(mget(deltaColsNames), normalDelta, price)])
#user  system elapsed 
#1.426   1.166   2.591

Scenario 6: mget and .SD combined:

system.time(DT[, (deltaColsNewNames) := lapply(.SD[, mget(deltaColsNames)], normalDelta, price)])
#user  system elapsed 
#2.149   1.788   4.974

UPDATE: After increasing the size of the dataset: Scenario 2&4&5 are coming out quite ahead. However, scenario 5 has a much higher memory footprint than 2&4, as I ran into memory issues on my laptop when testing this with a bigger dataset (see updated results above)

Optimal way in data.table to make multiple columns from vectors of column name strings

Tags:

r

data.table

Biebiep

2 Answers

Arun

Biebiep

Recent Activity

Donate For Us

Optimal way in data.table to make multiple columns from vectors of column name strings

Tags:

r

data.table

Biebiep

2 Answers

Arun

Biebiep

Related questions

Recent Activity

Donate For Us