I need to drop one column from a data.frame containing a few hundred columns.
With a data.frame, I'd use subset to do this conveniently:
> dat <- data.table( data.frame(x=runif(10),y=rep(letters[1:5],2),z=runif(10)),key='y' )
> subset(dat,select=c(-z))
x y
1: 0.1969049 a
2: 0.7916696 a
3: 0.9095970 b
4: 0.3529506 b
5: 0.4923602 c
6: 0.5993034 c
7: 0.1559861 d
8: 0.9929333 d
9: 0.3980169 e
10: 0.1921226 e
Obviously this still works, but it seems like not a very data.table-like idiom. I could manually construct a list of the column names I wanted to keep, which seems a little more data.table-like:
> dat[,list(x,y)]
x y
1: 0.1969049 a
2: 0.7916696 a
3: 0.9095970 b
4: 0.3529506 b
5: 0.4923602 c
6: 0.5993034 c
7: 0.1559861 d
8: 0.9929333 d
9: 0.3980169 e
10: 0.1921226 e
But then I have to construct such a list, which is clunky.
Is subset the proper way to conveniently drop a column or two, or does it cause a performance hit? If not, what's the better way?
Edit
Benchmarks:
> dat <- data.table( data.frame(x=runif(10^7),y=rep(letters[1:10],10^6),z=runif(10^7)),key='y' )
> microbenchmark( subset(dat,select=c(-z)), dat[,list(x,y)] )
Unit: milliseconds
expr min lq median uq max
1 dat[, list(x, y)] 102.62826 167.86793 170.72847 199.89789 792.0207
2 subset(dat, select = c(-z)) 33.26356 52.55311 53.53934 55.00347 180.8740
But really where it may matter more is for memory if subset copies the whole data.table.
Right-click the column you want to delete and choose Delete Column from the shortcut menu. If the column participates in a relationship (FOREIGN KEY or PRIMARY KEY), a message prompts you to confirm the deletion of the selected columns and their relationships. Choose Yes.
The SQL ALTER TABLE statement is used to add, modify, or drop/delete columns in a table.
To physically drop a column you can use one of the following syntaxes, depending on whether you wish to drop a single or multiple columns. alter table table_name drop column column_name; alter table table_name drop (column_name1, column_name2);
SQL DELETE column using SSMS object explorer In the particular database, locate the particular table and expand the columns. It shows all the columns of the particular table. Right-click on the column we want to remove and click on Delete as shown in the following image.
If you are wanting to remove the column permanently use := NULL
dat[, z := NULL]
If you have your columns to drop as a character string use () to force evaluation as a character string, not as the character name.
toDrop <- c('z')
dat[, (toDrop) := NULL]
If you want to limit the availability of the columns in .SD, you can pass the .SDcols argument
dat[,lapply(.SD, somefunction) , .SDcols = setdiff(names(dat),'z')]
However, data.table inspects the j arguments and only gets the columns you use any way. See FAQ 1.12
When you write X[Y,sum(foo*bar)], data.table automatically inspects the j expression to see which columns it uses.
and doesn't try and load all the data for .SD (unless you have .SD within your call to j)
subset.data.table is processing the call and eventually evaluating dat[, c('x','y'), with=FALSE]
using := NULL should be basically instantaneous, howveer t does permanently delete the column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With