I have a list of data.frames, each data.frame is a separate output of the same format. I can merge all of them into a master list with
master <- do.call("rbind", list)
I have two columns in particular (1 and 3) in which I want to find duplicates, which I do so with
unique.pairs <- unique(master[duplicated(master[,c(1,3)]),c(1,3)])
This will give me a data.frame of unique column 1, 3 pairings.
Now I want to find, for each unique pair, where are the other instances of this pair. The ideal output would be a data.frame with one column containing the pairing (can concatenate the string to one) and another containing the row names from master (easily obtainable with the index of duplicates I'm trying to find) where the duplicate exists
A dummy example (stripped down to the two columns of interest):
master <- data.frame(A=c(1,1,2,2,3,3,4,4,5,5), B=c(1,2,3,3,4,5,6,6,7,8))
unique.pairs <- unique(master[duplicated(master,c(1,2)]),c(1,2)])
Now I want to be able to make a data.frame as such:
results <- data.frame(instance=c("2->3","4->6"), indices=c("3,4","7,8"))
I am thinking you iterate through each pair of unique.pairs then find where that pair exists in master, but I can't figure out the syntax.
With data.table, you can do...
library(data.table)
setDT(master)
master[master[, .N, by=names(master)][ N > 1L ], on=names(master),
.(N, locs = .(.I)), by=.EACHI]
# A B N locs
# 1: 2 3 2 3,4
# 2: 4 6 2 7,8
Note that we don't even need to construct an object like unique.pairs.
N is the number of repetitions. You can omit it from output by just using .(locs = .(.I)).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With