How to find row indices where duplicated pairs exist

Question

I have a list of data.frames, each data.frame is a separate output of the same format. I can merge all of them into a master list with

master <- do.call("rbind", list)

I have two columns in particular (1 and 3) in which I want to find duplicates, which I do so with

unique.pairs <- unique(master[duplicated(master[,c(1,3)]),c(1,3)])

This will give me a data.frame of unique column 1, 3 pairings.

Now I want to find, for each unique pair, where are the other instances of this pair. The ideal output would be a data.frame with one column containing the pairing (can concatenate the string to one) and another containing the row names from master (easily obtainable with the index of duplicates I'm trying to find) where the duplicate exists

A dummy example (stripped down to the two columns of interest):

master <- data.frame(A=c(1,1,2,2,3,3,4,4,5,5), B=c(1,2,3,3,4,5,6,6,7,8))
unique.pairs <- unique(master[duplicated(master,c(1,2)]),c(1,2)])

Now I want to be able to make a data.frame as such:

results <- data.frame(instance=c("2->3","4->6"), indices=c("3,4","7,8"))

I am thinking you iterate through each pair of unique.pairs then find where that pair exists in master, but I can't figure out the syntax.

Frank · Accepted Answer

With data.table, you can do...

library(data.table)
setDT(master)

master[master[, .N, by=names(master)][ N > 1L ], on=names(master), 
  .(N, locs = .(.I)), by=.EACHI]

#    A B N locs
# 1: 2 3 2  3,4
# 2: 4 6 2  7,8

Note that we don't even need to construct an object like unique.pairs.

N is the number of repetitions. You can omit it from output by just using .(locs = .(.I)).

How to find row indices where duplicated pairs exist

Tags:

r

TomNash

1 Answers

Frank

Recent Activity

Donate For Us

How to find row indices where duplicated pairs exist

Tags:

r

TomNash

1 Answers

Frank

Related questions

Recent Activity

Donate For Us