Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find row indices where duplicated pairs exist

Tags:

r

I have a list of data.frames, each data.frame is a separate output of the same format. I can merge all of them into a master list with

master <- do.call("rbind", list)

I have two columns in particular (1 and 3) in which I want to find duplicates, which I do so with

unique.pairs <- unique(master[duplicated(master[,c(1,3)]),c(1,3)])

This will give me a data.frame of unique column 1, 3 pairings.

Now I want to find, for each unique pair, where are the other instances of this pair. The ideal output would be a data.frame with one column containing the pairing (can concatenate the string to one) and another containing the row names from master (easily obtainable with the index of duplicates I'm trying to find) where the duplicate exists

A dummy example (stripped down to the two columns of interest):

master <- data.frame(A=c(1,1,2,2,3,3,4,4,5,5), B=c(1,2,3,3,4,5,6,6,7,8))
unique.pairs <- unique(master[duplicated(master,c(1,2)]),c(1,2)])

Now I want to be able to make a data.frame as such:

results <- data.frame(instance=c("2->3","4->6"), indices=c("3,4","7,8"))

I am thinking you iterate through each pair of unique.pairs then find where that pair exists in master, but I can't figure out the syntax.

like image 441
TomNash Avatar asked Dec 04 '25 21:12

TomNash


1 Answers

With data.table, you can do...

library(data.table)
setDT(master)

master[master[, .N, by=names(master)][ N > 1L ], on=names(master), 
  .(N, locs = .(.I)), by=.EACHI]

#    A B N locs
# 1: 2 3 2  3,4
# 2: 4 6 2  7,8

Note that we don't even need to construct an object like unique.pairs.

N is the number of repetitions. You can omit it from output by just using .(locs = .(.I)).

like image 58
Frank Avatar answered Dec 06 '25 11:12

Frank



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!