I have a data frame with a number of fixed columns, followed by a variable number of columns, depending on the input.
dd <- data.frame(key1=c("NED", "LTU", "LAT", "ITA"), h=c(18, 3, 2, 59)) # Two columns.
dd <- rbind(dd, dd, dd) # Example dataframe.
# Append a random number of auxiliary columns, ak1, ak2, ...
set.seed(2024); n <- sample(4:8, size=1) # Number of aux. columns.
akeys <- matrix(sample(x=c(1,2,3), size=nrow(dd) * n, replace=TRUE),
ncol=n
) # With random values 1..3.
colnames(akeys) <- paste0("ak", seq(n)) # Column names: ak1, ak2, ...
d2 <- cbind(dd, akeys) # Append n columns for sorting.
# Assuming n = 5.
id3 <- order(d2$key1, d2$h, -d2$ak1, -d2$ak2, -d2$ak3, -d2$ak4, -d2$ak5) # Sort all columns by name.
id4 <- order(d2[,1], d2[,2], -d2[,3], -d2[,4], -d2[,5], -d2[,6], -d2[, 7]) # Sort by column number.
Question: How to sort, in decreasing or increasing order, the data frame by (a selection of) all columns without knowing all the columns beforehand.
For example:
# Output d3
d2[id3, ]
# key1 h ak1 ak2 ak3 ak4 ak5
# 8 ITA 59 2 1 3 3 3
# 4 ITA 59 1 2 3 1 1
# 12 ITA 59 1 2 1 1 1
# 3 LAT 2 3 2 2 1 3
# 11 LAT 2 3 1 3 2 3
# 7 LAT 2 2 1 2 1 1
# 10 LTU 3 1 3 1 1 2
# 2 LTU 3 1 2 3 1 1
# 6 LTU 3 1 2 1 2 2
# 9 NED 18 3 3 1 1 1
# 5 NED 18 1 2 3 2 2
# 1 NED 18 1 2 1 2 3
Edit (@jay.sf). Sort data frame by (a selection of) all columns in ascending or descending order:
nc <- ncol(d2) # Number of columns in d2.
bycols <- subset(d2, select = c(key1, tail(seq(nc), n)) ) # Select columns by name and index in one line.
bysort <- c(FALSE, rep(TRUE, nc-1) ) # Decending, asscending ...
idx <- do.call('order', # New sort index.
c(bycols, # List of sort keys.
decreasing=list(bysort), # Sort direction.
method='radix')
)
identical(id3, idx)
d6 <- d2[idx, ]; d6
Using order with do.call.
> d2[do.call('order', d2), ]
key1 h ak1 ak2 ak3 ak4 ak5
12 ITA 59 1 2 1 1 1
4 ITA 59 1 2 3 1 1
8 ITA 59 2 1 3 3 3
7 LAT 2 2 1 2 1 1
11 LAT 2 3 1 3 2 3
3 LAT 2 3 2 2 1 3
6 LTU 3 1 2 1 2 2
2 LTU 3 1 2 3 1 1
10 LTU 3 1 3 1 1 2
1 NED 18 1 2 1 2 3
5 NED 18 1 2 3 2 2
9 NED 18 3 3 1 1 1
> d2[do.call('order', c(d2, decreasing=TRUE)), ]
key1 h ak1 ak2 ak3 ak4 ak5
9 NED 18 3 3 1 1 1
5 NED 18 1 2 3 2 2
1 NED 18 1 2 1 2 3
10 LTU 3 1 3 1 1 2
2 LTU 3 1 2 3 1 1
6 LTU 3 1 2 1 2 2
3 LAT 2 3 2 2 1 3
11 LAT 2 3 1 3 2 3
7 LAT 2 2 1 2 1 1
8 ITA 59 2 1 3 3 3
4 ITA 59 1 2 3 1 1
12 ITA 59 1 2 1 1 1
Try adding decreasing= argument for each column.
> d2[
+ do.call('order',
+ c(d2, decreasing=list(c(FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE)),
+ method='radix')),
+ ]
key1 h ak1 ak2 ak3 ak4 ak5
12 ITA 59 1 2 1 1 1
4 ITA 59 1 2 3 1 1
8 ITA 59 2 1 3 3 3
7 LAT 2 2 1 2 1 1
3 LAT 2 3 2 2 1 3
11 LAT 2 3 1 3 2 3
10 LTU 3 1 3 1 1 2
6 LTU 3 1 2 1 2 2
2 LTU 3 1 2 3 1 1
1 NED 18 1 2 1 2 3
5 NED 18 1 2 3 2 2
9 NED 18 3 3 1 1 1
You can run the following code to sort using all columns. Note that the sorting is by reference. Note that this solution is not based on built-in functions though.
data.table::setorderv(d2)
key1 h ak1 ak2 ak3 ak4 ak5
12 ITA 59 1 2 1 1 1
4 ITA 59 1 2 3 1 1
8 ITA 59 2 1 3 3 3
7 LAT 2 2 1 2 1 1
11 LAT 2 3 1 3 2 3
3 LAT 2 3 2 2 1 3
6 LTU 3 1 2 1 2 2
2 LTU 3 1 2 3 1 1
10 LTU 3 1 3 1 1 2
1 NED 18 1 2 1 2 3
5 NED 18 1 2 3 2 2
9 NED 18 3 3 1 1 1
Note that this function has two arguments to customize how the sorting is done and what columns to use. Argument cols specifies the columns to use (all columns by default) and argument order specifies how each column is used (increasing or decreasing order); In order, 1 is used for increase and -1 for decrease order, and it can have length equal to the number of columns used for sorting.
To sort a data.frame by ascending order based on the first column and descending order for the remaining columns, the code below can be used:
data.table::setorderv(d2, order=c(1, rep(-1, length(d2)-1)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With