Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Reference data frame to return column names as a variable, not string (for subset)

Is there a way to reference a data frame's column names as a variable, not a string (in R)? Say I want to get the first column name of data frame df. the code colnames returns...

> colnames(df)[[1]]
[1] "colname1" 

The reason I ask is I'm having a hard time making the function subset generalized to any data frame. Say I wish to do a conditional subset on a data frame with a known conditional, but I don't know the column name at runtime (just the column number). Example --

> df<-data.frame( x=c(1:3), y=c(4:6))
> df.sub <- subset(df, df$y >5 )

But lets say I don't know the column name of df at runtime, only that its column number 2. The function call

> df.sub <- subset(df, colnames(df)[[2]] >5 )

Doesn't work because colnames returns a string, and subset is 'smart' and looks inside df for the object name. Is there a good way around this? I could use [ 's instead but I feel the problem would be the same.

like image 685
tkg Avatar asked Jan 30 '26 17:01

tkg


1 Answers

You should be able to use double square brackets successfully for either name or index number:

> subset(df, df[["y"]] > 5)
  x y
3 3 6
> subset(df, df[[2]] > 5)
  x y
3 3 6

However, note the following from the help page to subset:

Warning

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.


And, to give some bad advice, you could also use get:

> subset(df, get(colnames(df)[2]) > 5)
  x y
3 3 6

As @Roland notes in the comments, most R users would actually use something along the lines of:

> df[df[[2]] > 5, ]
  x y
3 3 6
like image 111
A5C1D2H2I1M1N2O1R2T1 Avatar answered Feb 02 '26 11:02

A5C1D2H2I1M1N2O1R2T1