I'm trying to do a somewhat complicated task in R.
I have a data frame with (for simplicity's sake) three columns.
Column 1 is a string.
Column 2 is an integer.
Column 3 is an integer.
I want to take all the observations which contain a certain substring in Column 1 AND that have an exact value for Column 2, and replace the third Column with the number 1.
That is, I have the following dataframe:
x <- data.frame(x1 = c("bob","jane","bob","bobby","bob","jane","bobby","bob","jane","bob"),
x2 = c(1,1,1,1,1,2,2,2,2,2),
x3 = c(13,22,3,34,10,23,53,42,13,35))
And, I want to select observations where Column 1 contains bob and Column 2==1, and change the third column to 1, so that I end up with:
y1 <- c("bob","jane","bob","bobby","bob","jane","bobby","bob","jane","bob")
y2 <- c(1,1,1,1,1,2,2,2,2,2)
y3 <- c(1,22,1,1,1,23,53,42,13,35)
y <- data.frame(y1,y2,y3)
I want to do this across a really, really big dataset. It is not feasible to split up the dataset and put it back together.
I have tried using grep, but it's not working when I try to do both matches at once. Also, I have tried subsetting, but then I'd have to split apart the dataframe and put it back together.
Thanks very much in advance.
With R's capacity for logical indexing using the [<- function, this is really quite easy:
> x$x3[ grepl("bob", x$x1) & x$x2 == 1] <- 1
> x
x1 x2 x3
1 bob 1 1
2 jane 1 22
3 bob 1 1
4 bobby 1 1
5 bob 1 1
6 jane 2 23
7 bobby 2 53
8 bob 2 42
9 jane 2 13
10 bob 2 35
To read the code you should see it as: "for every line of x where column 'x1' has "bob' and column 'x2' is equal to 1 ,... you assign the value 1 to column 'x3'." If you wanted to have a new object with that value, you could make a copy of x with y <- x and working on that instead.
There is a nice answer from user akrun using the dplyr package to a similar problem here and a faster variant from user docendo discimus here. In your case, the code would be :
x %>% mutate(x3 = replace(x3, x1 == 'bob' & x2 == 1, 1))
or
x %>% mutate(x3 = replace(x3, which(x1 == 'bob' & x2 == 1), 1))
If you want to update x directly, you could combine with the %<>% operator from the magrittr package as:
x %<>% mutate(x3 = replace(x3, x1 == 'bob' & x2 == 1, 1))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With