I've got a dataframe with a text column name and factor city. It is ordered alphabetically firstly by city and then name. Now I need to get a data frame which contains only nth element in each city, keeping this ordering. How can it be done in a pretty way without loops?
I have:
name city
John Atlanta
Josh Atlanta
Matt Atlanta
Bob Boston
Kate Boston
Lily Boston
Matt Boston
I want a function, which returns n'th element by city, i.e., if it is 3rd, then:
name city
Matt Atlanta
Lily Boston
It should return NULL for name if it is out of range for the selected city, i.e., for 4th:
name city
NULL Atlanta
Matt Boston
Using only base R please?
In base R using by:
Set up some test data, including an additional out of range value:
test <- read.table(text="name city
John Atlanta
Josh Atlanta
Matt Atlanta
Bob Boston
Kate Boston
Lily Boston
Matt Boston
Bob Seattle
Kate Seattle",header=TRUE)
Get the 3rd item in each city:
do.call(rbind,by(test,test$city,function(x) x[3,]))
Result:
name city
Atlanta Matt Atlanta
Boston Lily Boston
Seattle <NA> <NA>
To get exactly what you want, here is a little function:
nthrow <- function(dset,splitvar,n) {
result <- do.call(rbind,by(dset,dset[splitvar],function(x) x[n,]))
result[,splitvar][is.na(result[,splitvar])] <- row.names(result)[is.na(result[,splitvar])]
row.names(result) <- NULL
return(result)
}
Call it like:
nthrow(test,"city",3)
Result:
name city
1 Matt Atlanta
2 Lily Boston
3 <NA> Seattle
A data.table solution
library(data.table)
DT <- data.table(test)
# return all columns from the subset data.table
n <- 4
DT[,.SD[n,] ,by = city]
## city name
## 1: Atlanta NA
## 2: Boston Matt
## 3: Seattle NA
# if you just want the nth element of `name`
# (excluding other columns that might be there)
# any of the following would work
DT[,.SD[n,] ,by = city, .SDcols = 'name']
DT[, .SD[n, list(name)], by = city]
DT[, list(name = name[n]), by = city]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With