Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find unique pairs of words ignoring their order in two columns in R

Tags:

r

unique

I have a data frame that contains duplicated values in two columns.

   dat<-data.frame(V1 = c("home","cat","fire","sofa","kitchen","sofa"), 
                    V2 = c("cat","home","water","TV","knife","TV"), V3 = c('date1','date1','date2','date3','date4','date3'))

       V1    V2    V3
1    home   cat date1
2     cat  home date1
3    fire water date2
4    sofa    TV date3
5 kitchen knife date4
6    sofa    TV date1

I would like to obtain from this dataframe unique pairs ignoring the order in which the pair is presented between the two columns.

This would be the result that I would like to obtain:

       V1    V2    V3
1    home   cat date1
2    fire water date2
3    sofa    TV date3
4 kitchen knife date4
like image 741
CafféSospeso Avatar asked Oct 22 '25 04:10

CafféSospeso


1 Answers

dat[!duplicated(t(apply(dat, 1, sort))),]

Using apply and sort will loop through each row and sort. We can then transpose the output and determine duplicates using duplicated. Because duplicated returns a boolean we then subset all rows in dat where duplicated = FALSE.

like image 185
sumshyftw Avatar answered Oct 23 '25 18:10

sumshyftw



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!