Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pivot_wider either read single values as duplicates or create a wide-and-long tibble (without merging rows)?

I browsed through most of the related questions posted here, but none seemed to be the same issue that I am facing. From what I've read, the issues already posted here are related to duplicate values in long-form data (lacking unique identifiers) which result in wide-form data with list-cols, and this is usually fixed by creating a dummy variable column which is a string of unique numbers. I've tried all the different solutions that I saw, but none of them solved my issue, which is why I decided to post this question.

The Data

I have a long-form table of various plant species (and their counts and layer) found in different plots:

> rep_example[1:15,]
   Point   Species Number Layer
1    P03 Lari_deci     21     C
2    P03 Quer_rope     17     C
3    P03 Pinu_sylv      5     C
4    P03 Sorb_aucu      3     U
5    P03 Betu_pend      1     C
6    P03 Acer_pseu      1     U
7    P06 Quer_rope     28     C
8    P06 Pinu_sylv     28     C
9    P06 Popu_trem      6     C
10   P06 Lari_deci      3     C
11   P07 Fagu_sylv    110     C
12   P07 Pinu_sylv     20     C
13   P07 Pice_abie      5     C
14   P07 Quer_rope      3     C
15   P07 Betu_pend      1     C
> dput(rep_example[1:15,])
structure(list(Point = c("P03", "P03", "P03", "P03", "P03", "P03", 
"P06", "P06", "P06", "P06", "P07", "P07", "P07", "P07", "P07"
), Species = c("Lari_deci", "Quer_rope", "Pinu_sylv", "Sorb_aucu", 
"Betu_pend", "Acer_pseu", "Quer_rope", "Pinu_sylv", "Popu_trem", 
"Lari_deci", "Fagu_sylv", "Pinu_sylv", "Pice_abie", "Quer_rope", 
"Betu_pend"), Number = c("21", "17", "5", "3", "1", "1", "28", 
"28", "6", "3", "110", "20", "5", "3", "1"), Layer = c("C", "C", 
"C", "U", "C", "U", "C", "C", "C", "C", "C", "C", "C", "C", "C"
)), row.names = c(NA, 15L), class = "data.frame")

The Ideal Result

I wish to create a wide form of this table by having the different Species names as columns and having just one row per Layer per Point:

> rep_example_ideal
  Point Layer Lari_deci Quer_rope Pinu_sylv Sorb_aucu Betu_pend Acer_pseu
1   P03     C        21        17         5         0         1         0
2   P03     U         0         0         0         3         0         1
3   P06     C         3        28        28         0         0         0
4   P06     U         0         0         0         0         0         0
5   P07     C         0         3        20         1         1         0
6   P07     U         0         0         0         0         0         0
> dput(rep_example_ideal)
structure(list(Point = c("P03", "P03", "P06", "P06", "P07", "P07"
), Layer = c("C", "U", "C", "U", "C", "U"), Lari_deci = c("21", 
"0", "3", "0", "0", "0"), Quer_rope = c("17", "0", "28", "0", 
"3", "0"), Pinu_sylv = c("5", "0", "28", "0", "20", "0"), Sorb_aucu = c("0", 
"3", "0", "0", "1", "0"), Betu_pend = c("1", "0", "0", "0", "1", 
"0"), Acer_pseu = c("0", "1", "0", "0", "0", "0")), class = "data.frame", row.names = c(NA, 
-6L))

The Problem Code

This is the code I am using to create the wide table:

rep_example %>% group_by(Point, Layer) %>% 
  mutate(Number = as.numeric(Number)) %>% 
  distinct() %>% 
  mutate(rn = 1:n()) %>% 
  pivot_wider(id_cols = c(Point, Layer, rn), names_from = Species, values_from = Number)
# A tibble: 172 x 17
# Groups:   Point, Layer [57]
   Point Layer    rn Lari_deci Quer_rope Pinu_sylv Sorb_aucu Betu_pend Acer_pseu Popu_trem
   <chr> <chr> <int>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
 1 P03   C         1        21        NA        NA        NA        NA        NA        NA
 2 P03   C         2        NA        17        NA        NA        NA        NA        NA
 3 P03   C         3        NA        NA         5        NA        NA        NA        NA
 4 P03   U         1        NA        NA        NA         3        NA        NA        NA
 5 P03   C         4        NA        NA        NA        NA         1        NA        NA
 6 P03   U         2        NA        NA        NA        NA        NA         1        NA
 7 P06   C         1        NA        28        NA        NA        NA        NA        NA
 8 P06   C         2        NA        NA        28        NA        NA        NA        NA
 9 P06   C         3        NA        NA        NA        NA        NA        NA         6
10 P06   C         4         3        NA        NA        NA        NA        NA        NA
# ... with 162 more rows, and 7 more variables: Fagu_sylv <dbl>, Pice_abie <dbl>,
#   Abie_alba <dbl>, Fran_alnu <dbl>, Tili_cord <dbl>, Alnu_glut <dbl>, Quer_rubr <dbl>
  1. I am using mutate(rn = 1:n()) to create a dummy variable rn, in order to ensure unique identifiers. Rows with the same value for Week and Point aren't being merged, and instead show up as separate rows. I tried using different forms of group_by() but these don't make a difference, while explicitly stating id_cols in the pivot_wider() leads to issue #2 below.

  2. When I do not use mutate(rn = 1:n()), the wide data produced consists of list-cols even though the list length is 1 (all other questions posted here resulted in longer list-cols, i.e., duplicates) and the combination of Week and Point provides a unique ID. However, the above problem of rows not being merged is fixed in this method.

rep_example %>% group_by(Point, Layer) %>% 
  mutate(Number = as.numeric(Number)) %>% 
  pivot_wider(id_cols = c(Point, Layer), names_from = Species, values_from = Number)
# A tibble: 57 x 16
# Groups:   Point, Layer [57]
   Point Layer Lari_deci Quer_rope Pinu_sylv Sorb_aucu Betu_pend Acer_pseu Popu_trem Fagu_sylv
   <chr> <chr> <list>    <list>    <list>    <list>    <list>    <list>    <list>    <list>   
 1 P03   C     <dbl [1]> <dbl [1]> <dbl [1]> <NULL>    <dbl [1]> <NULL>    <NULL>    <NULL>   
 2 P03   U     <NULL>    <NULL>    <NULL>    <dbl [1]> <NULL>    <dbl [1]> <NULL>    <NULL>   
 3 P06   C     <dbl [1]> <dbl [1]> <dbl [1]> <NULL>    <NULL>    <NULL>    <dbl [1]> <NULL>   
 4 P07   C     <NULL>    <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]> <NULL>    <NULL>    <dbl [1]>
 5 P07   U     <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>   
 6 P08   C     <NULL>    <dbl [1]> <dbl [1]> <NULL>    <NULL>    <NULL>    <dbl [1]> <NULL>   
 7 P08   U     <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>   
 8 P10   U     <NULL>    <dbl [1]> <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>   
 9 P10   C     <NULL>    <dbl [1]> <dbl [1]> <NULL>    <dbl [1]> <NULL>    <NULL>    <NULL>   
10 P11   C     <NULL>    <dbl [1]> <dbl [1]> <NULL>    <NULL>    <NULL>    <NULL>    <NULL>   
# ... with 47 more rows, and 6 more variables: Pice_abie <list>, Abie_alba <list>,
#   Fran_alnu <list>, Tili_cord <list>, Alnu_glut <list>, Quer_rubr <list>
Warning message:
Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates 

The pivot_wider is working in some other code I tried with a different part of the data. This particular issue has remained unresolved, and I'd highly appreciate any sort of help!!!

Thanks!

like image 552
Karthik Thrikkadeeri Avatar asked Dec 22 '25 00:12

Karthik Thrikkadeeri


1 Answers

We could create a sequence column with rowid

library(dplyr)
library(tidyr)
library(data.table)
df1 %>% 
  mutate(rn = rowid(Point, Species)) %>%
  pivot_wider(names_from = Species, values_from = Number, 
       values_fill = list(Number = '0'))

If we want all the combinations, use complete

df1 %>% 
   complete(Point, Layer, fill = list(Number = '0')) %>%
   fill(Species) %>%
   pivot_wider(names_from = Species, values_from = Number,  
         values_fill = list(Number = '0'))
# A tibble: 6 x 11
#  Point Layer Lari_deci Quer_rope Pinu_sylv Betu_pend Sorb_aucu Acer_pseu Popu_trem Fagu_sylv Pice_abie
#  <chr> <chr> <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>    
#1 P03   C     21        17        5         1         0         0         0         0         0        
#2 P03   U     0         0         0         0         3         1         0         0         0        
#3 P06   C     3         28        28        0         0         0         6         0         0        
#4 P06   U     0         0         0         0         0         0         0         0         0        
#5 P07   C     0         3         20        1         0         0         0         110       5        
#6 P07   U     0         0         0         0         0         0         0         0         0        
like image 164
akrun Avatar answered Dec 23 '25 17:12

akrun