Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Combining lapply and left_join to conditionally merge dataframes

I'm hoping someone out there might be able to help me get to the root of a frustrating problem I've been having with my code in R. I have a list consisting of dataframes, and I want to left join each element on one of two OTHER dataframes (call them A and B). Which of these secondary dataframes to join on depends on the element's position in the list. For my purposes, I'd like every odd element to be left-joined to A, and every even element to be left-joined to B.

library(dplyr)
DF <- data.frame(Num = c("1","2"), Let = c("a","b"), stringsAsFactors = FALSE)
A <- data.frame(Let = c("a","b"), Col = c("Yellow","Red"), stringsAsFactors = FALSE)
B <- data.frame(Let = c("a","b"), Col = c("Green","Blue"), stringsAsFactors = FALSE)
LIST <- list(DF, DF)

So far, I've tried to do this in two different ways. The first approach involved an if-else statement. If I apply such a statement to assign an integer value based on position, I obtain the expected result. Similarly, when I do away with the if-else statement and simply perform a series of left-joins on the list elements, everything works as expected.

lapply(seq_along(LIST), function(x, {ifelse((x %% 2)==0, y[[x]] <- 1, y[[x]] <- 2)}, y = LIST)
lapply(seq_along(LIST), function(x, {left_join(y[[x]], A, by = c("Let"))}, y = LIST)

Where I run into problems is when I attempt to combine the if-else statement and the left join. In particular, I end up with a list consisting of lists, each of which retains only the first column of the original corresponding dataframe.

lapply(seq_along(LIST), function(x, y) {ifelse((x %% 2)==0, left_join(y[[x]], A, by = c("Let")), left_join(y[[x]], B, by = c("Let")))}, y = LIST)

Here is the output I would LIKE to obtain:

[[1]]
  Let Num    Col
1   a   1 Yellow
2   b   2    Red

[[2]]
  Let Num   Col
1   a   1 Green
2   b   2  Blue

I'm certain there's an absurdly simple solution to the problem. Can anyone see it?

Thanks in advance! Matthew

P.S.: I also attempted a second approach, applying subsetting rather than an if-else statement. Again, however, I run into problems. The first line below works as expected, but the second returns an error, as though R doesn't recognize the list indices:

lapply(seq_along(LIST), function(x, y) {left_join(y[[x > 0]], A, by = c("Let"))}, y = LIST)
lapply(seq_along(LIST), function(x, y) {left_join(y[[x == 1]], A, by = c("Let"))}, y = LIST)

Error in y[[x == 1]] : attempt to select less than one element in integerOneIndex 
like image 615
M. Stargardter Avatar asked Nov 04 '25 22:11

M. Stargardter


2 Answers

I'm not entirely sure I understand your issue.

The following solution is based on a reproduction of the output of lapply(seq_along(LIST), function(x, y) {left_join(y[[x > 0]], A, by = c("Let"))}, y = LIST) from your postscript. Note that the other lapply lines throw errors.

library(tidyverse);
map(list(A, B), function(x) left_join(DF, x))
#Joining, by = "Let"
#Joining, by = "Let"
#[[1]]
#  Num Let    Col
#1   1   a Yellow
#2   2   b    Red
#
#[[2]]
#  Num Let   Col
#1   1   a Green
#2   2   b  Blue

We use purrr:map with dplyr::left_join to join A and B with DF.


The same can be achieved in base R using Map and merge:

mapply(function(x) merge(DF, x, by = "Let"), list(A, B), SIMPLIFY = F)
#[[1]]
#  Let Num    Col
#1   a   1 Yellow
#2   b   2    Red
#
#[[2]]
#  Let Num   Col
#1   a   1 Green
#2   b   2  Blue
like image 127
Maurits Evers Avatar answered Nov 07 '25 09:11

Maurits Evers


Overview

Use base::mapply() to return a list of data frames which were conditionally merged. Here, I supply two inputs:

  1. seq.along( along.with = LIST ) to obtain the number of elements in LIST; and
  2. LIST itself.

The FUN argument is an anonymous function that takes in two inputs - i and j - and tests whether the current element in LIST is even or odd numbered prior to performing a left-join using base::merge().

If the result of the modulus operator for the ith element in seq.along( along.with = LIST ) is equal to zero, then left-join B onto the jth element in LIST; if it does not equal zero, then perform the left-join A onto the jth element in LIST.

# load data
DF <- data.frame(Num = c("1","2"), Let = c("a","b"), stringsAsFactors = FALSE)
A <- data.frame(Let = c("a","b"), Col = c("Yellow","Red"), stringsAsFactors = FALSE)
B <- data.frame(Let = c("a","b"), Col = c("Green","Blue"), stringsAsFactors = FALSE)
LIST <- list(DF, DF)

# goal: left join all odd elements in LIST[[j]]
#       to `A` and all even elements to `B`
merged.list <- 
  mapply( FUN = function( i, j )
          if( i %% 2 == 0 ){
            merge( x = j
                   , y = B
                   , by = "Let"
                   , all.x = TRUE )
          } else{
            merge( x = j
                   , y = A
                   , by = "Let"
                   , all.x = TRUE )
          }
        , seq_along( along.with = LIST )
        , LIST
        , SIMPLIFY = FALSE )

# view results
merged.list
# [[1]]
# Let Num    Col
# 1   a   1 Yellow
# 2   b   2    Red
# 
# [[2]]
# Let Num   Col
# 1   a   1 Green
# 2   b   2  Blue

# end of script #

Tidyverse approach

The results are replicated below using functions from the purrr and dplyr packages.

library( dplyr )
library( purrr )

merged.list <-
  map2( .x = seq_along( along.with = LIST )
        , .y = LIST
        , .f = function( i, j )
          if( i %% 2 == 0 ){
            left_join( x = j
                       , y = B
                       , by = "Let" )
          } else{
            left_join( x = j
                       , y = A
                       , by = "Let" )
          })

# view results
merged.list
# [[1]]
# Num Let    Col
# 1   1   a Yellow
# 2   2   b    Red
# 
# [[2]]
# Num Let   Col
# 1   1   a Green
# 2   2   b  Blue

# end of script #
like image 26
Cristian E. Nuno Avatar answered Nov 07 '25 08:11

Cristian E. Nuno



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!