I've a dataset in a csv file that looks as follows:
 X               Colour Orange Red White Violet Black Yellow Blue
1 1          Orange, Red     NA  NA    NA     NA    NA     NA   NA
2 2                  Red     NA  NA    NA     NA    NA     NA   NA
3 3         White, Black     NA  NA    NA     NA    NA     NA   NA
4 4               Yellow     NA  NA    NA     NA    NA     NA   NA
5 5 Blue, Orange, Violet     NA  NA    NA     NA    NA     NA   NA
I'm trying to add 0s and 1s for every row-column match that occurs. The expected out put is:
      Colour     Orange Red White   Violet  Black   Yellow  Blue
1   Orange,Red   1       1    0        0      0        0      0
2   Red          0       1    0        0      0        0      0
3   White,Black  0       0    1        0      1        0      0
4   Yellow       0       0    0        0      0        1      0
5   Blue,Orange, 1       0    0        1      0        0      1
    Violet
How to achieve this in R?
Loop across the column names, and check if they're in the pattern using grepl:
dat[-(1:2)] <-  sapply( colnames(dat[-(1:2)]), grepl, x=dat$Colour  ) + 0
#  X               Colour Orange Red White Violet Black Yellow Blue
#1 1          Orange, Red      1   1     0      0     0      0    0
#2 2                  Red      0   1     0      0     0      0    0
#3 3         White, Black      0   0     1      0     1      0    0
#4 4               Yellow      0   0     0      0     0      1    0
#5 5 Blue, Orange, Violet      1   0     0      1     0      0    1
Not sure whether you added the NA columns or not. Even without having any identifier NA columns, we can use strsplit to split the "Colour" column, apply mtabulate on the list output and if needed, rearrange the output based on the column names of 'dat'
library(qdapTools)
cbind(dat[1:2], mtabulate(strsplit(dat$Colour, ', ')))[names(dat)]
#   X               Colour Orange Red White Violet Black Yellow Blue
#1 1          Orange, Red      1   1     0      0     0      0    0
#2 2                  Red      0   1     0      0     0      0    0
#3 3         White, Black      0   0     1      0     1      0    0
#4 4               Yellow      0   0     0      0     0      1    0
#5 5 Blue, Orange, Violet      1   0     0      1     0      0    1
or a similar approach would be to use cSplit_e from splitstackshape
library(splitstackshape)
cSplit_e(dat[1:2], 'Colour', type='character', fill=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With