Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing values in data frame in R

Tags:

dataframe

r

I'm trying to do a somewhat complicated task in R.

I have a data frame with (for simplicity's sake) three columns.

Column 1 is a string.
Column 2 is an integer.
Column 3 is an integer.

I want to take all the observations which contain a certain substring in Column 1 AND that have an exact value for Column 2, and replace the third Column with the number 1.

That is, I have the following dataframe:

x <- data.frame(x1 = c("bob","jane","bob","bobby","bob","jane","bobby","bob","jane","bob"),
                x2 = c(1,1,1,1,1,2,2,2,2,2),
                x3 = c(13,22,3,34,10,23,53,42,13,35))

And, I want to select observations where Column 1 contains bob and Column 2==1, and change the third column to 1, so that I end up with:

y1 <- c("bob","jane","bob","bobby","bob","jane","bobby","bob","jane","bob")
y2 <- c(1,1,1,1,1,2,2,2,2,2)
y3 <- c(1,22,1,1,1,23,53,42,13,35)
y <- data.frame(y1,y2,y3)

I want to do this across a really, really big dataset. It is not feasible to split up the dataset and put it back together. I have tried using grep, but it's not working when I try to do both matches at once. Also, I have tried subsetting, but then I'd have to split apart the dataframe and put it back together. Thanks very much in advance.

like image 409
ejn Avatar asked Dec 18 '25 20:12

ejn


2 Answers

With R's capacity for logical indexing using the [<- function, this is really quite easy:

> x$x3[ grepl("bob", x$x1) & x$x2 == 1] <- 1
> x
      x1 x2 x3
1    bob  1  1
2   jane  1 22
3    bob  1  1
4  bobby  1  1
5    bob  1  1
6   jane  2 23
7  bobby  2 53
8    bob  2 42
9   jane  2 13
10   bob  2 35

To read the code you should see it as: "for every line of x where column 'x1' has "bob' and column 'x2' is equal to 1 ,... you assign the value 1 to column 'x3'." If you wanted to have a new object with that value, you could make a copy of x with y <- x and working on that instead.

like image 185
IRTFM Avatar answered Dec 20 '25 13:12

IRTFM


There is a nice answer from user akrun using the dplyr package to a similar problem here and a faster variant from user docendo discimus here. In your case, the code would be :

x %>% mutate(x3 = replace(x3, x1 == 'bob' & x2 == 1, 1))

or

x %>% mutate(x3 = replace(x3, which(x1 == 'bob' & x2 == 1), 1))

If you want to update x directly, you could combine with the %<>% operator from the magrittr package as:

x %<>% mutate(x3 = replace(x3, x1 == 'bob' & x2 == 1, 1))

like image 28
meriops Avatar answered Dec 20 '25 13:12

meriops



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!