Apologies is this is something a more seasoned R user would know, but I just came across this and wanted to ask about proper usage.
It appears to be possible to classify ranges for variables by using as.factor. So, I could group observations into a range. For example, if I were looking at visits by user, it looks that I could write an if/then statement to bin the users by the range of visits they had, then get summary statistics based on the group.
Here is the link where I learned about this: http://programming-r-pro-bro.blogspot.com/2011/10/modelling-with-r-part-2.html
Now, while this function looks easier than grouping data by using plyr and ddply, it does not look to be powerful enough to break the variable into X number of bins (for example 10 for a decile) - You would have to do that yourself.
This leads to my question - Is one better than the other for grouping data, or are there just many ways to tackle grouping like this?
Thanks
I think cut is a better tool for this.
With some sample data:
set.seed(123)
age <- round(runif(10,20,50))
This is what I'd do:
> cut(age, c(0,30,40,Inf))
[1] (0,30] (40,Inf] (30,40] (40,Inf] (40,Inf] (0,30] (30,40] (40,Inf]
[9] (30,40] (30,40]
Levels: (0,30] (30,40] (40,Inf]
Optionally, setting the factor labels manually:
> cut(age, c(0,30,40,Inf), labels=c('0-30', '31-40', '40+'))
[1] 0-30 40+ 31-40 40+ 40+ 0-30 31-40 40+ 31-40 31-40
Levels: 0-30 31-40 40+
To contrast, the linked page suggests this:
> as.factor(ifelse(age<=30, '0-30', ifelse(age <= 40, '30-40', '40+')))
[1] 0-30 40+ 30-40 40+ 40+ 0-30 30-40 40+ 30-40 30-40
Levels: 0-30 30-40 40+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With