I have a small problem that I can't seem to solve. Given two columns:
dt <- data.table(ColumnA = c("A,B,C,A,A,A", "A,B,C"), ColumnB = c("A,C,A", "C"))
I would like to "subtract" columnB from columnA, which will result in:
data.table(Result = c("B,A,A", "A,B"))
How would one achieve this fact without first transforming it into a list and then try to subtract the list? In addition, since the dataset is quite big, it cannot be done using a for loop in R.
Every item in the comma seperated string should be treated as one item and should be subtracted only once if it occurs once. Hence not all A's are gone in the first row.
Another option leveraging the function vecsets::vsetdiff
which doesn't remove duplicates:
library(dplyr)
library(tidyr)
library(purrr)
library(vecsets)
dt %>%
mutate(x = strsplit(ColumnA,","),
y = strsplit(ColumnB,",")) %>%
mutate(z = map2(x,y,vecsets::vsetdiff))
ColumnA ColumnB x y z
1 A,B,C,A,A,A A,C,A A, B, C, A, A, A A, C, A B, A, A
2 A,B,C C A, B, C C A, B
Note that you end up with list columns here (which I created on purpose for this to work), but the data might be easier to work with that way anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With