I have explored various options using quosures, symbols, and evaluation, but I can't seem to get the right syntax. Here is an example dataframe.
data.frame("A" = letters[1:4], "B" = letters[26:23], "C" = letters[c(1,3,5,7)], "D" = letters[c(2,4,6,8)], "pastecols" = c("B, C","B, D", "B, C, D", NA))
A B C D pastecols
1 a z a b B, C
2 b y c d B, D
3 c x e f B, C, D
4 d w g h <NA>
Now suppose I want to paste values from different columns based on the lookup string in pastecols, and I always want to include column A. This is my desired result:
A B C D pastecols result
1 a z a b B, C a z a
2 b y c d B, D b y d
3 c x e f B, C, D c x e f
4 d w g h <NA> d
Ideally this could be done in dplyr. This is the closest I have gotten:
x %>% mutate(result = lapply(lapply(str_split(pastecols, ", "), c, "A"), na.omit))
A B C D pastecols result
1 a z a b B, C B, C, A
2 b y c d B, D B, D, A
3 c x e f B, C, D B, C, D, A
4 d w g h <NA> A
Here's one way using pmap to do a similar thing. pmap can be used to effectively work on dataframes by row by capturing each row as a named vector; you can then get the desired column names for indexing as cols by selecting them with ["pastecols"].
Most of the anonymous function syntax is not tidyverse stuff, but just basic R stuff. To walk through it:
.l argument of pmap_chr. Remember that dataframes are lists of columns!... arguments with c(...). Basically we are calling each row of the dataframe as arguments to the function; now row is a named vector containing the row. Note that if you have list-columns this will break, (but so will a lot of other things here so I assume there aren't any...)row that we want from row["pastecols"], but we need to turn (say) "B, C" into c("A", "B", "C") to do that. This next line just adds the "A", replaces missing values with "A", splits into pieces if there are any, and then indexes back down into the list. The [[ part is just how you do list[[1]]" in a pipe chain, it's the prefix form of the operator. You need this because str_split returns a list and we just want the vector.cols vector to get the desired values from row and return it, collapsed into a length 1 character vector!library(tidyverse)
tbl <- tibble("A" = letters[1:4], "B" = letters[26:23], "C" = letters[c(1,3,5,7)], "D" = letters[c(2,4,6,8)], "pastecols" = c("B, C","B, D", "B, C, D", NA))
tbl %>%
mutate(result = pmap_chr(
.l = .,
.f = function(...){
row <- c(...)
cols <- row["pastecols"] %>% str_c("A, ", .) %>% replace_na("A") %>% str_split(", ") %>% `[[`(1)
vals <- row[cols] %>% str_c(collapse = ", ")
return(vals)
}
))
#> # A tibble: 4 x 6
#> A B C D pastecols result
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a z a b B, C a, z, a
#> 2 b y c d B, D b, y, d
#> 3 c x e f B, C, D c, x, e, f
#> 4 d w g h <NA> d
Created on 2018-12-03 by the reprex package (v0.2.0).
Not the most elegant solution but gets the job done with just base R. If column A never shows up in pastecols you can remove unique() from the code.
for(r in seq_len(nrow(df))) {
df$result[r] <- paste(
df[r, na.omit(unique(c("A", unlist(strsplit(df$pastecols[r], ", ")))))],
collapse = " "
)
}
df
A B C D pastecols result
1 a z a b B, C a z a
2 b y c d B, D b y d
3 c x e f B, C, D c x e f
4 d w g h <NA> d
Data -
df <- data.frame(
"A" = letters[1:4],
"B" = letters[26:23],
"C" = letters[c(1,3,5,7)],
"D" = letters[c(2,4,6,8)],
"pastecols" = c("B, C","B, D", "B, C, D", NA), stringsAsFactors = F
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With