Is there a way to reference a data frame's column names as a variable, not a string (in R)? Say I want to get the first column name of data frame df. the code colnames returns...
> colnames(df)[[1]]
[1] "colname1"
The reason I ask is I'm having a hard time making the function subset generalized to any data frame. Say I wish to do a conditional subset on a data frame with a known conditional, but I don't know the column name at runtime (just the column number). Example --
> df<-data.frame( x=c(1:3), y=c(4:6))
> df.sub <- subset(df, df$y >5 )
But lets say I don't know the column name of df at runtime, only that its column number 2. The function call
> df.sub <- subset(df, colnames(df)[[2]] >5 )
Doesn't work because colnames returns a string, and subset is 'smart' and looks inside df for the object name. Is there a good way around this? I could use [ 's instead but I feel the problem would be the same.
You should be able to use double square brackets successfully for either name or index number:
> subset(df, df[["y"]] > 5)
x y
3 3 6
> subset(df, df[[2]] > 5)
x y
3 3 6
However, note the following from the help page to subset:
Warning
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
And, to give some bad advice, you could also use get:
> subset(df, get(colnames(df)[2]) > 5)
x y
3 3 6
As @Roland notes in the comments, most R users would actually use something along the lines of:
> df[df[[2]] > 5, ]
x y
3 3 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With