I want to add a column to a data frame which will encode the specific levels of a factor. e.g.
subject rate
1 12
1 10
1 13
4 4
4 6
4 12
2 9
2 2
2 5
6 17
6 10
6 1
in the above data frame I wish add a third column called "treatment" where subjects are assigned to one of two levels "a" or "b". e.g. below
subject rate treatment
1 12 a
1 10 a
1 13 a
4 4 b
4 6 b
4 12 b
2 9 b
2 2 b
2 5 b
6 17 a
6 10 a
6 1 a
Thanks in advance for any help.
Here's another approach using the plyr package:
library(plyr)
#Make some fake data
set.seed(1)
dat <- data.frame(subject = rep(c(1,4,2,6), each = 3), rate = sample(1:20, 12, TRUE))
set.seed(1)
#Assign treatment based on the subject ID. This does not ensure that you will get
#at least one subject in each treatment group.
ddply(dat, "subject", transform, treatment = sample(letters[1:2], TRUE))
EDIT - to address your comment
Given that you want to specify which subject gets assigned to which treatment, Gavin's suggestion of merge is spot on. I would first make a new data.frame that contains one record for each unique subject, assign their treatment, and then merge them together:
treatments <- data.frame(subject = unique(dat$subject), treats = c("a", "b", "b", "a"))
merge(dat, treatments)
Note that the order of unique(dat$subject) is 1,4,2,6 which corresponds to the order of the values in the original data.frame. If your real problem contains more than four subjects, you may want to consider a more automated way of assigning treatments groups. One approach I've used in the past is to assign a random number to each respondent, and then assign groups based on a given threshold of that random number. It is essentially the same as the approach above, but can ensure that you get equal numbers in each group. For example:
dat <- ddply(dat, "subject", transform, treatment = runif(1))
dat <- within(dat, treatment <- ifelse(treatment < quantile(treatment, 0.5),"a", "b"))
If you want to assign treatments at random, this will do it:
## subject IDs
subj <- with(dat, unique(subject))
## how many treatment levels?
ntreat <- 2
## sample an identifier for the treaments
set.seed(47)
treats <- sample(letters[seq_len(ntreat)], length(subj), replace = TRUE)
## stick this into a subject/treatment data frame
Treat <- data.frame(cbind(subject = subj, treatment = treats))
This gives:
R> Treat
subject treatment
1 1 b
2 4 a
3 2 b
4 6 b
Edit:
If the treatments have been pre-assigned, then just create the Treat data frame by hand;
Treat <- data.frame(subject = c(1,4,2,6), treatment = c("a","b","b","a"))
If you have lots of these to do you can use functions like seq() and rep(), plus the inbuilt letters constant to speed up the "data entry".
End edit
We can now use this data frame in a merge with the original data to insert the treatment for the respective subject, using merge():
R> merge(dat, Treat)
subject rate treatment
1 1 12 b
2 1 10 b
3 1 13 b
4 2 9 b
5 2 2 b
6 2 5 b
7 4 4 a
8 4 6 a
9 4 12 a
10 6 17 b
11 6 10 b
12 6 1 b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With