Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pivotting a list of data frames and merging them

I have a list of 3 dataframes that share some rows and columns

DATA

all_data <- list(questionnaireA = structure(list(name = structure(2:1, .Label = c("James", 
          "Shawn"), class = "factor"), banana = c(1, 0), grapes = c(1, 
          1), orange = c("AB", 1)), class = "data.frame", row.names = c(NA, 
          -2L)), questionnaireB = structure(list(name = structure(2:1, .Label = c("Chris", 
          "James"), class = "factor"), orange = c(1, 0), banana = c(1, 
          0)), class = "data.frame", row.names = c(NA, -2L)), questionnaireC = structure(list(
          name = structure(3:1, .Label = c("Donald", "James", "Shawn"
          ), class = "factor"), banana = c(1, 0, 0), raisins = c(1, 
          1, 1), grapes = c(1, 1, 0), cake = c(0, 1, 0)), class = "data.frame", row.names = c(NA, -3L)))
$questionnaireA
   name banana grapes orange
1 Shawn      1      1     AB
2 James      0      1      1

$questionnaireB
   name orange banana
1 James      1      1
2 Chris      0      0

$questionnaireC
    name banana raisins grapes cake
1  Shawn      1       1      1    0
2  James      0       1      1    1
3 Donald      0       1      0    0
library(tidyverse)
map(all_data, ~ .x %>%
    pivot_longer(cols=-name, names_to="fruit"))
  1. Am not sure how to rename the value to the name of the dataframe.
  2. I don't know how to join the values and union the name fruit pair.

Any help would be appreciated!

like image 777
Shahin Avatar asked Nov 29 '25 22:11

Shahin


1 Answers

If we follow the similar option as in OP's attempted method, i.e. reshape to 'long' format each of the datasets in the list, then loop with imap, create the name of the list1 element as new column, use pivot_longer to reshape into 'long' format, then create a sequence column by group and reshape into 'wide' format with pivot_wider

library(dplyr)
library(tidyr)
library(purrr)
imap_dfr(all_data, ~
                   .x %>% 
                      mutate(grp = .y) %>%
                      pivot_longer(cols = -c(name, grp), 
                         names_to = "fruit", values_to = "Value")) %>% 
   #group_by(name, grp, fruit) %>% 
   #mutate(rn = row_number()) %>%
   pivot_wider(names_from = grp, values_from = Value)

Or do this more efficiently, by binding all the datasets to a single data with bind_cols, do the pivot_longer while removing the missing values with value_drop_na = TRUE and then do the same as in the above solution

bind_rows(all_data, .id = 'grp') %>%
    pivot_longer(cols = c(-name, -grp), names_to = "fruit", 
         values_to = "Value", values_drop_na = TRUE) %>%
    # sequence column creation is not really required for the example
    # as there are no duplicates
    #group_by(name, grp, fruit) %>% 
    #mutate(rn = row_number()) %>%
    pivot_wider(names_from = grp, values_from = Value)

Update

Based on the new data with some mix in column types, if we need to have the values like "AB" remain as such, it needs to be converted to character class

imap_dfr(all_data, ~
               .x %>% 
                  mutate_at(-1, as.character) %>% 
                  mutate(grp = .y) %>% 
                  pivot_longer(cols = -c(name, grp), names_to = "fruit",
                    values_to = "Value")) %>% 
       pivot_wider(names_from = grp, values_from = Value)

Or an efficient approach similar to the bind_rows earlier (but it can't be done here because the column types are different)

map_dfr(all_data, ~
         .x %>%
            mutate_at(-1, as.character), .id = 'grp') %>%  
    pivot_longer(cols = c(-name, -grp), names_to = "fruit", 
          values_to = "Value", values_drop_na = TRUE)  %>%
    pivot_wider(names_from = grp, values_from = Value)
like image 99
akrun Avatar answered Dec 02 '25 14:12

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!