Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to subtract two comma separated columns in R?

I have a small problem that I can't seem to solve. Given two columns:

dt <- data.table(ColumnA = c("A,B,C,A,A,A", "A,B,C"), ColumnB = c("A,C,A", "C"))

I would like to "subtract" columnB from columnA, which will result in:

data.table(Result = c("B,A,A", "A,B"))

How would one achieve this fact without first transforming it into a list and then try to subtract the list? In addition, since the dataset is quite big, it cannot be done using a for loop in R.

Every item in the comma seperated string should be treated as one item and should be subtracted only once if it occurs once. Hence not all A's are gone in the first row.

like image 865
Snowflake Avatar asked Dec 10 '19 21:12

Snowflake


1 Answers

Another option leveraging the function vecsets::vsetdiff which doesn't remove duplicates:

library(dplyr)
library(tidyr)
library(purrr)
library(vecsets)

dt %>% 
  mutate(x = strsplit(ColumnA,","),
         y = strsplit(ColumnB,",")) %>% 
  mutate(z = map2(x,y,vecsets::vsetdiff))

      ColumnA ColumnB                x       y       z
1 A,B,C,A,A,A   A,C,A A, B, C, A, A, A A, C, A B, A, A
2       A,B,C       C          A, B, C       C    A, B

Note that you end up with list columns here (which I created on purpose for this to work), but the data might be easier to work with that way anyway.

like image 69
joran Avatar answered Sep 19 '22 05:09

joran