Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

use mutate() to add column to data frame as a function of arbitrary set of other columns

I have a tibble. I need to add a new column in which each value is a function of the corresponding values in several other columns. Here is an example:

library(tibble)
tmp <- tribble(
  ~ID,     ~x1,     ~x2,
    1,   "200",     NA,
    2,   "300",   "400")

I want to add a new column, new, that is TRUE if and only if any of the corresponding values in x1 and x2 start with "3". That is, I want

# A tibble: 2 x 4
     ID x1    x2    new  
  <dbl> <chr> <chr> <lgl>
1     1 200   <NA>  NA   
2     2 300   400   TRUE 

In this example, new is a function of only x1 and x2. But there may be many of these "x" columns, and I won't always be able to write out their names. They will always start with "x", though, so this is one solution:

tmp %>%
  mutate(
    new = select(., starts_with("x")) %>%
      apply(., 1, function (x) any(substr(x, 1, 1)=="3"))
  )

But this solution is pretty clunky. Is there a more elegant way?

There are many related questions on Stack Overflow, but they generally speak to cases in which (a) the names of all columns in the original dataset are known and can be written out, or (b) the new variable is a function of all other columns in the data frame. (Here is one example.)

like image 558
user697473 Avatar asked Jan 19 '26 15:01

user697473


2 Answers

If you want to stay in tidyverse, we can use pmap for a row-wise operation :

library(dplyr)
library(purrr)

tmp %>% 
   mutate(new = pmap_lgl(select(., starts_with('x')), 
                ~any(startsWith(c(...), '3'), na.rm = TRUE)))

#     ID x1    x2    new  
#  <dbl> <chr> <chr> <lgl>
#1     1 200   NA    FALSE
#2     2 300   400   TRUE 

In base R, we can use row-wise apply

tmp$new <- apply(tmp[grep('x', names(tmp))], 1, function(x) 
                 any(startsWith(x, '3'), na.rm = TRUE))
like image 193
Ronak Shah Avatar answered Jan 22 '26 06:01

Ronak Shah


Here is an option with pivot_longer where we reshape into 'long' format with pivot_longer, do a group by 'ID' to check if there are any value that have 3 as the first digit and do a join with the original dataset

library(dplyr)
library(tidyr)
library(stringr)
tmp %>% 
   pivot_longer(cols = -ID, values_drop_na = TRUE) %>% 
   group_by(ID) %>%
   summarise(new = any(str_detect(value, '^3'))) %>% 
   right_join(tmp)
# A tibble: 2 x 4
#     ID new   x1    x2   
#* <dbl> <lgl> <chr> <chr>
#1     1 FALSE 200   <NA> 
#2     2 TRUE  300   400  

Or using base R, we can concatenate by row with paste and use grepl. Should be more efficient

grepl("(^|,)3", do.call(paste, c(tmp[-1], sep=",")))
#[1] FALSE  TRUE
like image 34
akrun Avatar answered Jan 22 '26 07:01

akrun