How to pull column name into a new column based on column contents in R

Question

I need to do three actions:

1. Count the rowwise non NA's values in a table and sum them (in a single column "check_na")

[I put my solution to this below, interested if someone can figure out how to do this with map though. I have already checked https://stackoverflow.com/questions/50680413/count-na-in-given-columns-by-rows for answers for this]

2. For those values that are not NA, create a column that concatenates these the unique values in a new column "block detail".

[I don't know how to do this]

3. If "check_na" has a value then pull in the column name(s) and concatenate them in a new column ("block type")

[I don't know how to do this]

This is what the final product should look like. note that in line 2 even though "b" appears twice, it only shows up once in "block detail" yet the columns that have it are individually listed "y|z"

      w x     y     z     na_check block_detail block_type
  <dbl> <chr> <chr> <chr>    <int> <chr>        <chr>     
1    NA a     NA    NA           1 a            x         
2    NA NA    b     b            2 b            y|z       
3    NA NA    b     c            2 b|c          y|z       
4    NA NA    NA    NA           0 NA           NA        
5    NA NA    NA    b            1 b            z

below is sample data and my solution to part 1:


#sample data
df <- tibble(w=rep(NA_real_,5),
       x=c(1,rep(NA_real_,4)),
       y=c(NA_real_,1,rep(NA_real_,3)),
       z=c(NA_real_,1,rep(NA_real_,2),1)
       )

#my solution to the first part, interested if someone can do this more efficiently or can do this with map as I have 100s columns that I need to do this with

df_na_check <- df %>% 
  mutate(across(everything(),
                list(na_check=~!is.na(.)),
                .names="{.col}_{.fn}")) %>% 
  rowwise() %>% 
mutate(na_check=sum(c_across(contains("na_check")))) %>% 
  select(w:z,na_check)

I appreciate any help. Ideally if the solution can use tidyverse but open to other methods (data.table or base r)

AndrewGB · Accepted Answer

We can first use rowSums to get the number of columns that are not NA. Then, we can use purrr to collapse the unique characters that are no NA into block_detail. Then, we can use apply to go over each row to get the column names where there is not an NA for block_type.

library(tidyverse)

df %>% 
  mutate(na_check = rowSums(!is.na(.), na.rm = T),
         block_detail = pmap_chr(select(., -na_check), ~paste0(unique(na.omit(c(...))), collapse = "|")),
         block_type = apply(df, 1, \(x) paste0(names(df)[which(!is.na(x))], collapse = "|")))

Output

   w    x    y    z na_check block_detail block_type
1 NA    a <NA> <NA>        1            a          x
2 NA <NA>    b    b        2            b        y|z
3 NA <NA>    b    c        2          b|c        y|z
4 NA <NA> <NA> <NA>        0                        
5 NA <NA> <NA>    b        1            b          z

Or using purrr instead of apply:

df %>% 
  mutate(na_check = rowSums(!is.na(.), na.rm = T),
         block_detail = pmap_chr(select(., -na_check), ~str_c(unique(na.omit(c(...))), collapse = "|"))) %>% 
  mutate(block_type = pmap_chr(select(., -c(na_check, block_detail)), ~str_c(names(c(...))[!is.na(c(...))], collapse="|")))

Data

df <- structure(list(w = c(NA, NA, NA, NA, NA), x = c("a", NA, NA, 
NA, NA), y = c(NA, "b", "b", NA, NA), z = c(NA, "b", "c", NA, 
"b")), class = "data.frame", row.names = c(NA, -5L))

benson23 · Answer

This answer mainly uses apply to go through the dataframe row-wise, and finally use unite to combine colnames that is not NA.

library(tidyverse)

df %>% 
  mutate(na_check = apply(df, 1, function(x) sum(!is.na(x))),
         across(-na_check, ~ifelse(is.na(.x), NA, cur_column()), .names = "{.col}_colname"),
         block_detail = apply(df, 1, function(x) paste(unique(na.omit(x)), collapse = '|'))) %>% 
  unite(col = "block_type", ends_with("_colname"), na.rm = T, sep = "|")

   w    x    y    z na_check block_type block_detail
1 NA    a <NA> <NA>        1          x            a
2 NA <NA>    b    b        2        y|z            b
3 NA <NA>    b    c        2        y|z          b|c
4 NA <NA> <NA> <NA>        0                        
5 NA <NA> <NA>    b        1          z            b

Data

Credit to @AndrewGB for the dput +1!

df <- structure(list(w = c(NA, NA, NA, NA, NA), x = c("a", NA, NA, 
NA, NA), y = c(NA, "b", "b", NA, NA), z = c(NA, "b", "c", NA, 
"b")), class = "data.frame", row.names = c(NA, -5L))

How to pull column name into a new column based on column contents in R

Tags:

r

data.table

dplyr

purrr

tidyr

alejandro_hagan

2 Answers

AndrewGB

Data

benson23

Recent Activity

Donate For Us

How to pull column name into a new column based on column contents in R

Tags:

r

data.table

dplyr

purrr

tidyr

alejandro_hagan

2 Answers

AndrewGB

Data

benson23

Related questions

Recent Activity

Donate For Us