How do I remove columns based on values in a data.table in R
If I have a data.table row,
dt = data.table("col1" = "a", "col2" = "b", "col3" = "c",
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)
The first 5 columns are categorical, and the 6-10 columns are numerical. The numbers are repeated for all rows for the numerical columns.
I had two doubts
How do I remove the columns containing 0s? This column can vary based on inputs - ie sometimes col7 might be 0 sometimes col8 might be 0 etc
Once I remove the columns with 0 values, how do I concatenate the rest of the numbers into a single column - in this case, the new column will contain the number 9799
Is there a way to do this without removing the 0 value columns?
For the first part, I tried
cols_chosen = c("col6", "col7","col8","col9","col10")
condition = c(FALSE, dt[, lapply(.SD, function(x) sum(x)< 1), .SDcols = cols_chosen])
dt[, which(condition) := NULL]
While I am getting the correct value for the conditions (a list of 5 logical values), the last command is failing with the error
Error in which(condition) : argument to 'which' is not logical
I had taken the above statements from an earlier answer Remove columns of dataframe based on conditions in R
dt = data.table("col1" = "a", "col2" = "b", "col3" = "c",
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)
not0 = function(x) is.numeric(x) && !anyNA(x) && all(x!=0)
dt[, .(
## your categorical columns
col1, col2, col3, col4, col5,
## new column pasted from non-0 numeric columns
new = as.numeric(paste0(unlist(.SD), collapse=""))
),
## this filters columns to be provided in .SD column subset
.SDcols = not0,
## we group by each row so it will handle input of multiple rows
by = .(row=seq_len(nrow(dt)))
][, row:=NULL ## this removes extra grouping column
][] ## this prints
# col1 col2 col3 col4 col5 new
#1: a b c d e 9799
Alternatively if you want to update in place existing table
is0 = function(x) is.numeric(x) && !anyNA(x) && all(x==0)
## remove columns that has 0
dt[, which(sapply(dt, is0)) := NULL]
## add new column
dt[, new := as.numeric(
paste0(unlist(.SD), collapse="")
), .SDcols=is.numeric, by=.(row=seq_len(nrow(dt)))
][]
# col1 col2 col3 col4 col5 col6 col8 col10 new
#1: a b c d e 9 7 99 9799
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With