assigning a factor to a data frame

Question

I want to add a column to a data frame which will encode the specific levels of a factor. e.g.

subject  rate
1          12
1          10 
1          13
4          4
4          6
4          12
2          9
2          2
2          5
6          17
6          10
6          1

in the above data frame I wish add a third column called "treatment" where subjects are assigned to one of two levels "a" or "b". e.g. below

subject  rate  treatment
1          12      a
1          10      a
1          13      a
4          4       b
4          6       b
4          12      b
2          9       b
2          2       b
2          5       b 
6          17      a
6          10      a
6          1       a

Thanks in advance for any help.

Chase · Accepted Answer

Here's another approach using the plyr package:

library(plyr)

#Make some fake data
set.seed(1)
dat <- data.frame(subject = rep(c(1,4,2,6), each = 3), rate = sample(1:20, 12, TRUE))

set.seed(1)
#Assign treatment based on the subject ID. This does not ensure that you will get
#at least one subject in each treatment group.
ddply(dat, "subject", transform, treatment = sample(letters[1:2], TRUE))

EDIT - to address your comment

Given that you want to specify which subject gets assigned to which treatment, Gavin's suggestion of merge is spot on. I would first make a new data.frame that contains one record for each unique subject, assign their treatment, and then merge them together:

treatments <- data.frame(subject = unique(dat$subject), treats = c("a", "b", "b", "a"))
merge(dat, treatments)

Note that the order of unique(dat$subject) is 1,4,2,6 which corresponds to the order of the values in the original data.frame. If your real problem contains more than four subjects, you may want to consider a more automated way of assigning treatments groups. One approach I've used in the past is to assign a random number to each respondent, and then assign groups based on a given threshold of that random number. It is essentially the same as the approach above, but can ensure that you get equal numbers in each group. For example:

dat <- ddply(dat, "subject", transform, treatment = runif(1))
dat <- within(dat, treatment <- ifelse(treatment < quantile(treatment, 0.5),"a", "b"))

Gavin Simpson · Answer

If you want to assign treatments at random, this will do it:

## subject IDs
subj <- with(dat, unique(subject))

## how many treatment levels?
ntreat <- 2

## sample an identifier for the treaments
set.seed(47)
treats <- sample(letters[seq_len(ntreat)], length(subj), replace = TRUE)

## stick this into a subject/treatment data frame
Treat <- data.frame(cbind(subject = subj, treatment = treats))

This gives:

R> Treat
  subject treatment
1       1         b
2       4         a
3       2         b
4       6         b

Edit:

If the treatments have been pre-assigned, then just create the Treat data frame by hand;

Treat <- data.frame(subject = c(1,4,2,6), treatment = c("a","b","b","a"))

If you have lots of these to do you can use functions like seq() and rep(), plus the inbuilt letters constant to speed up the "data entry".

End edit

We can now use this data frame in a merge with the original data to insert the treatment for the respective subject, using merge():

R> merge(dat, Treat)
   subject rate treatment
1        1   12         b
2        1   10         b
3        1   13         b
4        2    9         b
5        2    2         b
6        2    5         b
7        4    4         a
8        4    6         a
9        4   12         a
10       6   17         b
11       6   10         b
12       6    1         b

assigning a factor to a data frame

Tags:

r

ThallyHo

2 Answers

Chase

Gavin Simpson

Recent Activity

Donate For Us

assigning a factor to a data frame

Tags:

r

ThallyHo

2 Answers

Chase

Gavin Simpson

Related questions

Recent Activity

Donate For Us