Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

assigning a factor to a data frame

Tags:

r

I want to add a column to a data frame which will encode the specific levels of a factor. e.g.

subject  rate
1          12
1          10 
1          13
4          4
4          6
4          12
2          9
2          2
2          5
6          17
6          10
6          1

in the above data frame I wish add a third column called "treatment" where subjects are assigned to one of two levels "a" or "b". e.g. below

subject  rate  treatment
1          12      a
1          10      a
1          13      a
4          4       b
4          6       b
4          12      b
2          9       b
2          2       b
2          5       b 
6          17      a
6          10      a
6          1       a  

Thanks in advance for any help.

like image 845
ThallyHo Avatar asked Nov 18 '25 14:11

ThallyHo


2 Answers

Here's another approach using the plyr package:

library(plyr)

#Make some fake data
set.seed(1)
dat <- data.frame(subject = rep(c(1,4,2,6), each = 3), rate = sample(1:20, 12, TRUE))

set.seed(1)
#Assign treatment based on the subject ID. This does not ensure that you will get
#at least one subject in each treatment group.
ddply(dat, "subject", transform, treatment = sample(letters[1:2], TRUE))

EDIT - to address your comment

Given that you want to specify which subject gets assigned to which treatment, Gavin's suggestion of merge is spot on. I would first make a new data.frame that contains one record for each unique subject, assign their treatment, and then merge them together:

treatments <- data.frame(subject = unique(dat$subject), treats = c("a", "b", "b", "a"))
merge(dat, treatments)

Note that the order of unique(dat$subject) is 1,4,2,6 which corresponds to the order of the values in the original data.frame. If your real problem contains more than four subjects, you may want to consider a more automated way of assigning treatments groups. One approach I've used in the past is to assign a random number to each respondent, and then assign groups based on a given threshold of that random number. It is essentially the same as the approach above, but can ensure that you get equal numbers in each group. For example:

dat <- ddply(dat, "subject", transform, treatment = runif(1))
dat <- within(dat, treatment <- ifelse(treatment < quantile(treatment, 0.5),"a", "b"))
like image 153
Chase Avatar answered Nov 20 '25 04:11

Chase


If you want to assign treatments at random, this will do it:

## subject IDs
subj <- with(dat, unique(subject))

## how many treatment levels?
ntreat <- 2

## sample an identifier for the treaments
set.seed(47)
treats <- sample(letters[seq_len(ntreat)], length(subj), replace = TRUE)

## stick this into a subject/treatment data frame
Treat <- data.frame(cbind(subject = subj, treatment = treats))

This gives:

R> Treat
  subject treatment
1       1         b
2       4         a
3       2         b
4       6         b

Edit:

If the treatments have been pre-assigned, then just create the Treat data frame by hand;

Treat <- data.frame(subject = c(1,4,2,6), treatment = c("a","b","b","a"))

If you have lots of these to do you can use functions like seq() and rep(), plus the inbuilt letters constant to speed up the "data entry".

End edit

We can now use this data frame in a merge with the original data to insert the treatment for the respective subject, using merge():

R> merge(dat, Treat)
   subject rate treatment
1        1   12         b
2        1   10         b
3        1   13         b
4        2    9         b
5        2    2         b
6        2    5         b
7        4    4         a
8        4    6         a
9        4   12         a
10       6   17         b
11       6   10         b
12       6    1         b
like image 45
Gavin Simpson Avatar answered Nov 20 '25 04:11

Gavin Simpson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!