Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add a column in R based on other columns values [closed]

Tags:

r

dplyr

tidyverse

I'm relatively new to R. I have a table of data consisting of an id, plus 3 values.

library(dplyr)

df <- tibble(id=c(1, 2, 3),val_a = c(13,25,42), val_b = c(25,30,0), val_c = c(7,27,21))
df
# A tibble: 3 × 4
     id val_a val_b val_c
  <dbl> <dbl> <dbl> <dbl>
1     1    13    25     7
2     2    25    30    27
3     3    42     0    21

I want to append another column with a code that depends on val_a, val_b, and val_c being 20 or greater. I did it this way:

df_1 <- df |> 
  mutate(val_code = paste0(ifelse(val_a >= 20, "a", ""), 
                           ifelse(val_b >= 20, "b", ""), 
                           ifelse(val_c >= 20, "c", "")
  )
  )

df_1
# A tibble: 3 × 5
     id val_a val_b val_c val_code
  <dbl> <dbl> <dbl> <dbl> <chr>   
1     1    13    25     7 b       
2     2    25    30    27 abc     
3     3    42     0    21 ac

My method yielded the desired results (for id = 1, only b>=20, for id = 2, all of a, b, and c are >= 20, and for id = 3, only a and c are >= 20), but it seems like there might be a more elegant way of accomplishing the same task. Any ideas?

like image 346
Buckaroo Banzai Avatar asked Oct 23 '25 15:10

Buckaroo Banzai


2 Answers

Longer but hopefully self-explanatory. Take the data and join to it a version of itself where it is reshaped longer, filtered for >= 20, and summarized to combine for each id the column names with val_ removed.

library(tidyverse)
left_join(df, df |>
  pivot_longer(-id) |>
  filter(value >= 20) |>
  summarize(val_code = paste0(name |> str_remove("val_"), collapse = ""), .by = id))

Result

Joining with `by = join_by(id)`
# A tibble: 3 × 5
     id val_a val_b val_c val_code
  <dbl> <dbl> <dbl> <dbl> <chr>   
1     1    13    25     7 b       
2     2    25    30    27 abc     
3     3    42     0    21 ac  
like image 95
Jon Spring Avatar answered Oct 25 '25 04:10

Jon Spring


Update

Given a lookup table

lu <- tibble(var_name = c("val_a", "val_b", "val_c"), var_code = c("X", "Y", "Z"))

you can try

df %>%
    mutate(val_code = Reduce(
        str_c,
        across(
            !id, ~ if_else(
                .x >= 20,
                with(lu, var_code[match(cur_column(), var_name)]),
                ""
            )
        )
    ))

such that

# A tibble: 3 × 5
     id val_a val_b val_c val_code
  <dbl> <dbl> <dbl> <dbl> <chr>
1     1    13    25     7 Y
2     2    25    30    27 XYZ
3     3    42     0    21 XZ

Older

You don't have to code the value row by row, but could iteratively accumulate the values column by column.

You can try Reduce + across like below

df %>%
    mutate(val_code = Reduce(
        str_c,
        across(!id, ~ if_else(
            .x >= 20,
            sub(".*_", "", cur_column()),
            ""
        ))
    ))

which gives

     id val_a val_b val_c val_code
  <dbl> <dbl> <dbl> <dbl> <chr>
1     1    13    25     7 b
2     2    25    30    27 abc     
3     3    42     0    21 ac
like image 30
ThomasIsCoding Avatar answered Oct 25 '25 04:10

ThomasIsCoding



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!