Why does pivot_wider either read single values as duplicates or create a wide-and-long tibble (without merging rows)?

Question

I browsed through most of the related questions posted here, but none seemed to be the same issue that I am facing. From what I've read, the issues already posted here are related to duplicate values in long-form data (lacking unique identifiers) which result in wide-form data with list-cols, and this is usually fixed by creating a dummy variable column which is a string of unique numbers. I've tried all the different solutions that I saw, but none of them solved my issue, which is why I decided to post this question.

The Data

I have a long-form table of various plant species (and their counts and layer) found in different plots:

> rep_example[1:15,]
   Point   Species Number Layer
1    P03 Lari_deci     21     C
2    P03 Quer_rope     17     C
3    P03 Pinu_sylv      5     C
4    P03 Sorb_aucu      3     U
5    P03 Betu_pend      1     C
6    P03 Acer_pseu      1     U
7    P06 Quer_rope     28     C
8    P06 Pinu_sylv     28     C
9    P06 Popu_trem      6     C
10   P06 Lari_deci      3     C
11   P07 Fagu_sylv    110     C
12   P07 Pinu_sylv     20     C
13   P07 Pice_abie      5     C
14   P07 Quer_rope      3     C
15   P07 Betu_pend      1     C

> dput(rep_example[1:15,])
structure(list(Point = c("P03", "P03", "P03", "P03", "P03", "P03", 
"P06", "P06", "P06", "P06", "P07", "P07", "P07", "P07", "P07"
), Species = c("Lari_deci", "Quer_rope", "Pinu_sylv", "Sorb_aucu", 
"Betu_pend", "Acer_pseu", "Quer_rope", "Pinu_sylv", "Popu_trem", 
"Lari_deci", "Fagu_sylv", "Pinu_sylv", "Pice_abie", "Quer_rope", 
"Betu_pend"), Number = c("21", "17", "5", "3", "1", "1", "28", 
"28", "6", "3", "110", "20", "5", "3", "1"), Layer = c("C", "C", 
"C", "U", "C", "U", "C", "C", "C", "C", "C", "C", "C", "C", "C"
)), row.names = c(NA, 15L), class = "data.frame")

The Ideal Result

I wish to create a wide form of this table by having the different Species names as columns and having just one row per Layer per Point:

> rep_example_ideal
  Point Layer Lari_deci Quer_rope Pinu_sylv Sorb_aucu Betu_pend Acer_pseu
1   P03     C        21        17         5         0         1         0
2   P03     U         0         0         0         3         0         1
3   P06     C         3        28        28         0         0         0
4   P06     U         0         0         0         0         0         0
5   P07     C         0         3        20         1         1         0
6   P07     U         0         0         0         0         0         0

> dput(rep_example_ideal)
structure(list(Point = c("P03", "P03", "P06", "P06", "P07", "P07"
), Layer = c("C", "U", "C", "U", "C", "U"), Lari_deci = c("21", 
"0", "3", "0", "0", "0"), Quer_rope = c("17", "0", "28", "0", 
"3", "0"), Pinu_sylv = c("5", "0", "28", "0", "20", "0"), Sorb_aucu = c("0", 
"3", "0", "0", "1", "0"), Betu_pend = c("1", "0", "0", "0", "1", 
"0"), Acer_pseu = c("0", "1", "0", "0", "0", "0")), class = "data.frame", row.names = c(NA, 
-6L))

The Problem Code

This is the code I am using to create the wide table:

rep_example %>% group_by(Point, Layer) %>% 
  mutate(Number = as.numeric(Number)) %>% 
  distinct() %>% 
  mutate(rn = 1:n()) %>% 
  pivot_wider(id_cols = c(Point, Layer, rn), names_from = Species, values_from = Number)

# A tibble: 172 x 17
# Groups:   Point, Layer [57]
   Point Layer    rn Lari_deci Quer_rope Pinu_sylv Sorb_aucu Betu_pend Acer_pseu Popu_trem
   <chr> <chr> <int>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
 1 P03   C         1        21        NA        NA        NA        NA        NA        NA
 2 P03   C         2        NA        17        NA        NA        NA        NA        NA
 3 P03   C         3        NA        NA         5        NA        NA        NA        NA
 4 P03   U         1        NA        NA        NA         3        NA        NA        NA
 5 P03   C         4        NA        NA        NA        NA         1        NA        NA
 6 P03   U         2        NA        NA        NA        NA        NA         1        NA
 7 P06   C         1        NA        28        NA        NA        NA        NA        NA
 8 P06   C         2        NA        NA        28        NA        NA        NA        NA
 9 P06   C         3        NA        NA        NA        NA        NA        NA         6
10 P06   C         4         3        NA        NA        NA        NA        NA        NA
# ... with 162 more rows, and 7 more variables: Fagu_sylv <dbl>, Pice_abie <dbl>,
#   Abie_alba <dbl>, Fran_alnu <dbl>, Tili_cord <dbl>, Alnu_glut <dbl>, Quer_rubr <dbl>

I am using mutate(rn = 1:n()) to create a dummy variable rn, in order to ensure unique identifiers. Rows with the same value for Week and Point aren't being merged, and instead show up as separate rows. I tried using different forms of group_by() but these don't make a difference, while explicitly stating id_cols in the pivot_wider() leads to issue #2 below.
When I do not use mutate(rn = 1:n()), the wide data produced consists of list-cols even though the list length is 1 (all other questions posted here resulted in longer list-cols, i.e., duplicates) and the combination of Week and Point provides a unique ID. However, the above problem of rows not being merged is fixed in this method.

rep_example %>% group_by(Point, Layer) %>% 
  mutate(Number = as.numeric(Number)) %>% 
  pivot_wider(id_cols = c(Point, Layer), names_from = Species, values_from = Number)

# A tibble: 57 x 16
# Groups:   Point, Layer [57]
   Point Layer Lari_deci Quer_rope Pinu_sylv Sorb_aucu Betu_pend Acer_pseu Popu_trem Fagu_sylv
   <chr> <chr> <list>    <list>    <list>    <list>    <list>    <list>    <list>    <list>   
 1 P03   C     <dbl [1]> <dbl [1]> <dbl [1]> <NULL>    <dbl [1]> <NULL>    <NULL>    <NULL>   
 2 P03   U     <NULL>    <NULL>    <NULL>    <dbl [1]> <NULL>    <dbl [1]> <NULL>    <NULL>   
 3 P06   C     <dbl [1]> <dbl [1]> <dbl [1]> <NULL>    <NULL>    <NULL>    <dbl [1]> <NULL>   
 4 P07   C     <NULL>    <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]> <NULL>    <NULL>    <dbl [1]>
 5 P07   U     <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>   
 6 P08   C     <NULL>    <dbl [1]> <dbl [1]> <NULL>    <NULL>    <NULL>    <dbl [1]> <NULL>   
 7 P08   U     <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>   
 8 P10   U     <NULL>    <dbl [1]> <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>   
 9 P10   C     <NULL>    <dbl [1]> <dbl [1]> <NULL>    <dbl [1]> <NULL>    <NULL>    <NULL>   
10 P11   C     <NULL>    <dbl [1]> <dbl [1]> <NULL>    <NULL>    <NULL>    <NULL>    <NULL>   
# ... with 47 more rows, and 6 more variables: Pice_abie <list>, Abie_alba <list>,
#   Fran_alnu <list>, Tili_cord <list>, Alnu_glut <list>, Quer_rubr <list>
Warning message:
Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates

The pivot_wider is working in some other code I tried with a different part of the data. This particular issue has remained unresolved, and I'd highly appreciate any sort of help!!!

Thanks!

akrun · Accepted Answer

We could create a sequence column with rowid

library(dplyr)
library(tidyr)
library(data.table)
df1 %>% 
  mutate(rn = rowid(Point, Species)) %>%
  pivot_wider(names_from = Species, values_from = Number, 
       values_fill = list(Number = '0'))

If we want all the combinations, use complete

df1 %>% 
   complete(Point, Layer, fill = list(Number = '0')) %>%
   fill(Species) %>%
   pivot_wider(names_from = Species, values_from = Number,  
         values_fill = list(Number = '0'))
# A tibble: 6 x 11
#  Point Layer Lari_deci Quer_rope Pinu_sylv Betu_pend Sorb_aucu Acer_pseu Popu_trem Fagu_sylv Pice_abie
#  <chr> <chr> <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>    
#1 P03   C     21        17        5         1         0         0         0         0         0        
#2 P03   U     0         0         0         0         3         1         0         0         0        
#3 P06   C     3         28        28        0         0         0         6         0         0        
#4 P06   U     0         0         0         0         0         0         0         0         0        
#5 P07   C     0         3         20        1         0         0         0         110       5        
#6 P07   U     0         0         0         0         0         0         0         0         0

Why does pivot_wider either read single values as duplicates or create a wide-and-long tibble (without merging rows)?

Tags:

dataframe

r

duplicates

dplyr

tidyverse

The Data

The Ideal Result

The Problem Code

Karthik Thrikkadeeri

1 Answers

akrun

Recent Activity

Donate For Us

Why does pivot_wider either read single values as duplicates or create a wide-and-long tibble (without merging rows)?

Tags:

dataframe

r

duplicates

dplyr

tidyverse

The Data

The Ideal Result

The Problem Code

Karthik Thrikkadeeri

1 Answers

akrun

Related questions

Recent Activity

Donate For Us