I have a data.table
with columns 2 through 20 as strings with spaces (e.g., "Species Name"). I want to run str_replace()
on all those columns simultaneously so all the "Species Name" become "Species_Name". I can either do:
data.table(apply(as.data.frame(dt[,2:dim(dt)[2], with=F]), 2, function(x){ str_replace(x," ","_") }))
or if I keep it as a data.table
object, then I can do this one column at a time:
dt[,SpeciesName := str_replace(SpeciesName, " ", "_")
How do I do this for all columns 2 through the end similar to the one of the above?
Go to the Data tab > Data Tools group, click the What-If Analysis button, and then click Data Table… In the Data Table dialog window, click in the Column Input cell box (because our Investment values are in a column), and select the variable cell referenced in your formula.
A data table is a range of cells in which you can change values in some of the cells and come up with different answers to a problem. A good example of a data table employs the PMT function with different loan amounts and interest rates to calculate the affordable amount on a home mortgage loan.
Completely rewritten on 2015-11-24, to fix an error in previous versions.
Also added more modern options on 2019-09-27
You have a few options.
Process all of the target columns with an embedded call to lapply()
, using :=
to assign the modified values in place. This relies on :=
's very handy support for simultaneous assignment to several column named on its LHS.
Use a for
loop to run through the target columns one at a time, using set()
to modify the value of each one in turn.
Use a for
loop to iterate over multiple "naive" calls to [.data.table()
, each one of which modifies a single column.
These methods all seem about equally fast, so which one you use will be mostly a matter of taste. (1) is nicely compact and expressive. It's what I most often use, though you may find (2) easier to read. Because they process and modify the columns one at a time, (2) or (3) will have an advantage in the rare situation in which your data.table is so large that you are in danger of running up against limits imposed by your R session's available memory.
library(data.table) ## Create three identical 1000000-by-20 data.tables DT1 <- data.table(1:1e6, as.data.table(replicate(1e6, paste(sample(letters, nr, TRUE), sample(letters, nr, TRUE))))) cnames <- c("ID", paste0("X", 1:19)) setnames(DT1, cnames) DT2 <- copy(DT1); DT3 <- copy(DT1) ## Method 1 system.time({ DT1[, .SDcols=cnames[-1L], cnames[-1L] := lapply(.SD, function(x) gsub(" ", "_", x, fixed=TRUE)), ] }) ## user system elapsed ## 10.90 0.11 11.06 ## Method 2 system.time({ for(cname in cnames[-1]) { set(DT2, j=cname, value=gsub(" ", "_", DT2[[cname]], fixed=TRUE)) } }) ## user system elapsed ## 10.65 0.05 10.70 ## Method 3 system.time({ for(cname in cnames[-1]) { DT3[ , (cname) := gsub(" ", "_", get(cname), fixed=TRUE)] } }) ## user system elapsed ## 10.33 0.03 10.37
For more details on set()
and :=
, read their help page, gotten by typing ?set
or ?":="
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With