How do I run apply on a data.table?

Tags:

I have a data.table with columns 2 through 20 as strings with spaces (e.g., "Species Name"). I want to run str_replace() on all those columns simultaneously so all the "Species Name" become "Species_Name". I can either do:

data.table(apply(as.data.frame(dt[,2:dim(dt)[2], with=F]), 2,                                 function(x){ str_replace(x," ","_") }))

or if I keep it as a data.table object, then I can do this one column at a time:

dt[,SpeciesName := str_replace(SpeciesName, " ", "_")

How do I do this for all columns 2 through the end similar to the one of the above?

632

asked Feb 10 '12 23:02

Maiasaura

1 Answers

Completely rewritten on 2015-11-24, to fix an error in previous versions.

Also added more modern options on 2019-09-27

You have a few options.

Process all of the target columns with an embedded call to lapply(), using := to assign the modified values in place. This relies on :='s very handy support for simultaneous assignment to several column named on its LHS.
Use a for loop to run through the target columns one at a time, using set() to modify the value of each one in turn.
Use a for loop to iterate over multiple "naive" calls to [.data.table(), each one of which modifies a single column.

These methods all seem about equally fast, so which one you use will be mostly a matter of taste. (1) is nicely compact and expressive. It's what I most often use, though you may find (2) easier to read. Because they process and modify the columns one at a time, (2) or (3) will have an advantage in the rare situation in which your data.table is so large that you are in danger of running up against limits imposed by your R session's available memory.

library(data.table)  ## Create three identical 1000000-by-20 data.tables DT1 <- data.table(1:1e6,            as.data.table(replicate(1e6, paste(sample(letters, nr, TRUE),                                              sample(letters, nr, TRUE))))) cnames <- c("ID", paste0("X", 1:19)) setnames(DT1, cnames) DT2 <- copy(DT1); DT3 <- copy(DT1)  ## Method 1 system.time({ DT1[, .SDcols=cnames[-1L], cnames[-1L] :=    lapply(.SD, function(x) gsub(" ", "_", x, fixed=TRUE)), ] }) ##   user  system elapsed  ##  10.90    0.11   11.06   ## Method 2 system.time({     for(cname in cnames[-1]) {         set(DT2, j=cname, value=gsub(" ", "_", DT2[[cname]], fixed=TRUE))     } }) ##   user  system elapsed  ##  10.65    0.05   10.70   ## Method 3 system.time({     for(cname in cnames[-1]) {         DT3[ , (cname) := gsub(" ", "_", get(cname), fixed=TRUE)]     } }) ##   user  system elapsed  ##  10.33    0.03   10.37

For more details on set() and :=, read their help page, gotten by typing ?set or ?":=".

186

answered Oct 13 '22 17:10

Josh O'Brien

Related questions
                            
                                Reordering columns in a large dataframe
                            
                                Extract last word in string in R
                            
                                Refer to the last column in R
                            
                                Perform a Shapiro-Wilk Normality Test
                            
                                Why is R making a copy-on-modification after using str?
                            
                                Renaming RStudio project under version control
                            
                                Rscript: There is no package called ...?
                            
                                Why do I get "number of items to replace is not a multiple of replacement length"
                            
                                In R, what is the difference between unlink and file.remove?
                            
                                R time series modeling on weekly data using ts() object
                            
                                Automating version increase of R packages
                            
                                Call R (programming language) from .net
                            
                                Manually Downloading and Installing Packages in R
                            
                                How do I stop/end/halt a script in R?
                            
                                model.matrix generates fewer rows than original data.frame
                            
                                How to transform XML data into a data.frame?
                            
                                Simple frequency tables using data.table
                            
                                R: Is there a good replacement for plyr::rbind.fill in dplyr?
                            
                                What best practices do you use for programming in R? [closed]
                            
                                Anti-aliasing in R graphics under Windows (as per Mac)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I run apply on a data.table?

Tags:

r

data.table

apply

Maiasaura

People also ask

1 Answers

Josh O'Brien

Recent Activity

Donate For Us