Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to use dplyr::bind_rows without collecting data frames from the database?

Tags:

mysql

r

dplyr

Is there a way to use bind_rows() on a set of data frames without first collecting them from the database?

Say I've defined a couple dplyr query tables:

mydatabase  <- src_mysql('database')
table1  <- tbl(mydatabase,"table1")
table2  <- tbl(mydatabase,"table3")

foo  <-  table1 %>% filter(id > 10) %>% select(id)
bar  <-  table2 %>% select(id)

I'd like to be able to join foo and bar together--in essence, I'd like to perform a union on the two subqueries without having to drop to SQL. However, when I try that, I get an error because I'm trying to join two tbl_sql objects, rather that real data frames:

unioned_data_frame  <- bind_rows(foo,bar)

Error: incompatible sizes (1 != 8)

Any suggestions? In this toy example, writing the whole query in SQL wouldn't be a problem, but of course, in real life, foo and bar are often significantly more complicated.

like image 818
crazybilly Avatar asked Oct 15 '25 16:10

crazybilly


1 Answers

Using dplyr::union() will do the SQL union() action, although it's important to note that that dplyr::union() will remove duplicate rows (like the SQL version). Using dplyr::union_all() keeps duplicate rows like bind_rows().

Unfortunately, there isn't a way to get benefits of bind_rows(), particularly the very useful .id argument.

like image 95
crazybilly Avatar answered Oct 18 '25 07:10

crazybilly



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!