Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pull column name into a new column based on column contents in R

I need to do three actions:

1. Count the rowwise non NA's values in a table and sum them (in a single column "check_na")

[I put my solution to this below, interested if someone can figure out how to do this with map though. I have already checked https://stackoverflow.com/questions/50680413/count-na-in-given-columns-by-rows for answers for this]

2. For those values that are not NA, create a column that concatenates these the unique values in a new column "block detail".

[I don't know how to do this]

3. If "check_na" has a value then pull in the column name(s) and concatenate them in a new column ("block type")

[I don't know how to do this]

This is what the final product should look like. note that in line 2 even though "b" appears twice, it only shows up once in "block detail" yet the columns that have it are individually listed "y|z"

      w x     y     z     na_check block_detail block_type
  <dbl> <chr> <chr> <chr>    <int> <chr>        <chr>     
1    NA a     NA    NA           1 a            x         
2    NA NA    b     b            2 b            y|z       
3    NA NA    b     c            2 b|c          y|z       
4    NA NA    NA    NA           0 NA           NA        
5    NA NA    NA    b            1 b            z 

below is sample data and my solution to part 1:


#sample data
df <- tibble(w=rep(NA_real_,5),
       x=c(1,rep(NA_real_,4)),
       y=c(NA_real_,1,rep(NA_real_,3)),
       z=c(NA_real_,1,rep(NA_real_,2),1)
       )

#my solution to the first part, interested if someone can do this more efficiently or can do this with map as I have 100s columns that I need to do this with

df_na_check <- df %>% 
  mutate(across(everything(),
                list(na_check=~!is.na(.)),
                .names="{.col}_{.fn}")) %>% 
  rowwise() %>% 
mutate(na_check=sum(c_across(contains("na_check")))) %>% 
  select(w:z,na_check)

I appreciate any help. Ideally if the solution can use tidyverse but open to other methods (data.table or base r)

like image 646
alejandro_hagan Avatar asked Dec 10 '25 21:12

alejandro_hagan


2 Answers

We can first use rowSums to get the number of columns that are not NA. Then, we can use purrr to collapse the unique characters that are no NA into block_detail. Then, we can use apply to go over each row to get the column names where there is not an NA for block_type.

library(tidyverse)

df %>% 
  mutate(na_check = rowSums(!is.na(.), na.rm = T),
         block_detail = pmap_chr(select(., -na_check), ~paste0(unique(na.omit(c(...))), collapse = "|")),
         block_type = apply(df, 1, \(x) paste0(names(df)[which(!is.na(x))], collapse = "|")))

Output

   w    x    y    z na_check block_detail block_type
1 NA    a <NA> <NA>        1            a          x
2 NA <NA>    b    b        2            b        y|z
3 NA <NA>    b    c        2          b|c        y|z
4 NA <NA> <NA> <NA>        0                        
5 NA <NA> <NA>    b        1            b          z

Or using purrr instead of apply:

df %>% 
  mutate(na_check = rowSums(!is.na(.), na.rm = T),
         block_detail = pmap_chr(select(., -na_check), ~str_c(unique(na.omit(c(...))), collapse = "|"))) %>% 
  mutate(block_type = pmap_chr(select(., -c(na_check, block_detail)), ~str_c(names(c(...))[!is.na(c(...))], collapse="|")))

Data

df <- structure(list(w = c(NA, NA, NA, NA, NA), x = c("a", NA, NA, 
NA, NA), y = c(NA, "b", "b", NA, NA), z = c(NA, "b", "c", NA, 
"b")), class = "data.frame", row.names = c(NA, -5L))
like image 141
AndrewGB Avatar answered Dec 13 '25 09:12

AndrewGB


This answer mainly uses apply to go through the dataframe row-wise, and finally use unite to combine colnames that is not NA.

library(tidyverse)

df %>% 
  mutate(na_check = apply(df, 1, function(x) sum(!is.na(x))),
         across(-na_check, ~ifelse(is.na(.x), NA, cur_column()), .names = "{.col}_colname"),
         block_detail = apply(df, 1, function(x) paste(unique(na.omit(x)), collapse = '|'))) %>% 
  unite(col = "block_type", ends_with("_colname"), na.rm = T, sep = "|")

   w    x    y    z na_check block_type block_detail
1 NA    a <NA> <NA>        1          x            a
2 NA <NA>    b    b        2        y|z            b
3 NA <NA>    b    c        2        y|z          b|c
4 NA <NA> <NA> <NA>        0                        
5 NA <NA> <NA>    b        1          z            b

Data

Credit to @AndrewGB for the dput +1!

df <- structure(list(w = c(NA, NA, NA, NA, NA), x = c("a", NA, NA, 
NA, NA), y = c(NA, "b", "b", NA, NA), z = c(NA, "b", "c", NA, 
"b")), class = "data.frame", row.names = c(NA, -5L))
like image 22
benson23 Avatar answered Dec 13 '25 10:12

benson23



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!