id first middle last Age
1 Carol Jenny Smith 15
2 Sarah Carol Roberts 20
3 Josh David Richardson 22
I am trying find a specific name in ANY of the name columns (first, middle, last). For example, if I found anyone with a name Carol (doesn't matter if it's the first/middle/last name), I want to mutate a column 'Carol' and give 1. So what I want is the following
id first middle last Age Carol
1 Carol Jenny Smith 15 1
2 Sarah Carol Roberts 20 1
3 Josh David Richardson 22 0
I have been trying ifelse(c(first, middle,last) == "Carol" , 1, 0 ) or "Carol" %in% first...etc but for some reason I can only work on one column instead of multiple columns.. Could anyone help me please? Thank you in advance!
We can use rowSums
df$Carol <- as.integer(rowSums(df[2:4] == "Carol") > 0)
df
# id first middle last Age Carol
#1 1 Carol Jenny Smith 15 1
#2 2 Sarah Carol Roberts 20 1
#3 3 Josh David Richardson 22 0
If we need it as a function
fun <- function(df, value) {
as.integer(rowSums(df[2:4] == value) > 0)
}
fun(df, "Carol")
#[1] 1 1 0
fun(df, "Sarah")
#[1] 0 1 0
but this assumes the columns you want to search are at position 2:4.
To give more flexibility with column position
fun <- function(df, cols, value) {
as.integer(rowSums(df[cols] == value) > 0)
}
fun(df, c("first", "last","middle"), "Carol")
#[1] 1 1 0
fun(df, c("first", "last","middle"), "Sarah")
#[1] 0 1 0
Here's a tidyverse option. We first reshape the data to long format, group by id, and find levels of id that have the desired name in at least one row. Then we reshape back to wide format.
library(tidyverse)
df %>%
gather(key, value, first:last) %>%
group_by(id) %>%
mutate(Carol = as.numeric(any(value=="Carol"))) %>%
spread(key, value)
id Age Carol first last middle 1 1 15 1 Carol Smith Jenny 2 2 20 1 Sarah Roberts Carol 3 3 22 0 Josh Richardson David
Or, as a function:
find.target = function(data, target) {
data %>%
gather(key, value, first:last) %>%
group_by(id) %>%
mutate(!!target := as.numeric(any(value==target))) %>%
spread(key, value) %>%
# Move new target column to end
select(-target, target)
}
find.target(df, "Carol")
find.target(df, "Sarah")
You could also do several at once. For example:
map(c("Sarah", "Carol", "David"), ~ find.target(df, .x)) %>%
reduce(left_join)
id Age first last middle Sarah Carol David 1 1 15 Carol Smith Jenny 0 1 0 2 2 20 Sarah Roberts Carol 1 1 0 3 3 22 Josh Richardson David 0 0 1
Using tidyverse
library(tidyverse)
f1 <- function(data, wordToCompare, colsToCompare) {
wordToCompare <- enquo(wordToCompare)
data %>%
select(colsToCompare) %>%
mutate(!! wordToCompare := map(., ~
.x == as_label(wordToCompare)) %>%
reduce(`|`) %>%
as.integer)
}
f1(df1, Carol, c("first", 'middle', 'last'))
# first middle last Carol
#1 Carol Jenny Smith 1
#2 Sarah Carol Roberts 1
#3 Josh David Richardson 0
f1(df1, Sarah, c("first", 'middle', 'last'))
# first middle last Sarah
#1 Carol Jenny Smith 0
#2 Sarah Carol Roberts 1
#3 Josh David Richardson 0
Or this can also be done with pmap
df1 %>%
mutate(Carol = pmap_int(.[c('first', 'middle', 'last')],
~ +('Carol' %in% c(...))))
# id first middle last Age Carol
#1 1 Carol Jenny Smith 15 1
#2 2 Sarah Carol Roberts 20 1
#3 3 Josh David Richardson 22 0
which can be wrapped into a function
f2 <- function(data, wordToCompare, colsToCompare) {
wordToCompare <- enquo(wordToCompare)
data %>%
mutate(!! wordToCompare := pmap_int(.[colsToCompare],
~ +(as_label(wordToCompare) %in% c(...))))
}
f2(df1, Carol, c("first", 'middle', 'last'))
# id first middle last Age Carol
#1 1 Carol Jenny Smith 15 1
#2 2 Sarah Carol Roberts 20 1
#3 3 Josh David Richardson 22 0
NOTE: Both the tidyverse methods doesn't require any reshaping
With base R, we can loop through the 'first', 'middle', 'last' column and use == for comparison to get a list of logical vectors, which we Reduce to a single logical vector with | and coerce it to binary with +
df1$Carol <- +(Reduce(`|`, lapply(df1[2:4], `==`, 'Carol')))
df1
# id first middle last Age Carol
#1 1 Carol Jenny Smith 15 1
#2 2 Sarah Carol Roberts 20 1
#3 3 Josh David Richardson 22 0
NOTE: There are dupes for this post. For e.g. here
df1 <- structure(list(id = 1:3, first = c("Carol", "Sarah", "Josh"),
middle = c("Jenny", "Carol", "David"), last = c("Smith",
"Roberts", "Richardson"), Age = c(15L, 20L, 22L)),
class = "data.frame", row.names = c(NA,
-3L))
A solution using apply family
df$Carol = lapply(1:nrow(df), function(x) any(df[x,]=="Carol))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With