I have been struggling with how to select ONLY duplicated rows of data.frame in R. For Instance, my data.frame is:
age=18:29
height=c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
Names=c("John","John","John", "Harry", "Paul", "Paul", "Paul", "Khan", "Khan", "Khan", "Sam", "Joe")
village <- data.frame(Names, age, height)
Names age height
John 18 76.1
John 19 77.0
John 20 78.1
Harry 21 78.2
Paul 22 78.8
Paul 23 79.7
Paul 24 79.9
Khan 25 81.1
Khan 26 81.2
Khan 27 81.8
Sam 28 82.8
Joe 29 83.5
I want to see the result as following:
Names age height
John 18 76.1
John 19 77.0
John 20 78.1
Paul 22 78.8
Paul 23 79.7
Paul 24 79.9
Khan 25 81.1
Khan 26 81.2
Khan 27 81.8
Thanks for your time...
A solution using duplicated
twice:
village[duplicated(village$Names) | duplicated(village$Names, fromLast = TRUE), ]
Names age height
1 John 18 76.1
2 John 19 77.0
3 John 20 78.1
5 Paul 22 78.8
6 Paul 23 79.7
7 Paul 24 79.9
8 Khan 25 81.1
9 Khan 26 81.2
10 Khan 27 81.8
An alternative solution with by
:
village[unlist(by(seq(nrow(village)), village$Names,
function(x) if(length(x)-1) x)), ]
I find @Sven's answer using duplicated the "tidiest", but you can also do this many other ways. Here are two more:
Use table()
and subset by matching the names where the tabulation is > 1 with the names present in the first column:
village[village$Names %in% names(which(table(village$Names) > 1)), ]
Use ave()
to "tabulate" in a little different manner, but subset in the same way:
village[with(village, ave(as.numeric(Names), Names, FUN = length) > 1), ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With