How to sort a data frame (in R base) by all columns, without knowing all the columns beforehand

Question

I have a data frame with a number of fixed columns, followed by a variable number of columns, depending on the input.

dd <- data.frame(key1=c("NED", "LTU", "LAT", "ITA"), h=c(18, 3, 2, 59))      # Two columns.
dd <- rbind(dd, dd, dd)                                                      # Example dataframe.

# Append a random number of auxiliary columns, ak1, ak2, ...
set.seed(2024); n <- sample(4:8, size=1)                                     # Number of aux. columns.
akeys <- matrix(sample(x=c(1,2,3), size=nrow(dd) * n, replace=TRUE),
                ncol=n
               )                                                             # With random values 1..3.
colnames(akeys) <- paste0("ak", seq(n))                                      # Column names: ak1, ak2, ...

d2 <- cbind(dd, akeys)                                                       # Append n columns for sorting.
                                                                             # Assuming n = 5.
id3 <- order(d2$key1, d2$h, -d2$ak1, -d2$ak2, -d2$ak3, -d2$ak4, -d2$ak5)     # Sort all columns by name.
id4 <- order(d2[,1], d2[,2], -d2[,3], -d2[,4], -d2[,5], -d2[,6], -d2[, 7])   # Sort by column number.

Question: How to sort, in decreasing or increasing order, the data frame by (a selection of) all columns without knowing all the columns beforehand.

For example:

sort the data frame by all columns.
sort the data frame by key1 (ascending), and then by the last n columns (descending).

# Output d3
d2[id3, ]
#    key1  h ak1 ak2 ak3 ak4 ak5
# 8   ITA 59   2   1   3   3   3
# 4   ITA 59   1   2   3   1   1
# 12  ITA 59   1   2   1   1   1
# 3   LAT  2   3   2   2   1   3
# 11  LAT  2   3   1   3   2   3
# 7   LAT  2   2   1   2   1   1
# 10  LTU  3   1   3   1   1   2
# 2   LTU  3   1   2   3   1   1
# 6   LTU  3   1   2   1   2   2
# 9   NED 18   3   3   1   1   1
# 5   NED 18   1   2   3   2   2
# 1   NED 18   1   2   1   2   3

Edit (@jay.sf). Sort data frame by (a selection of) all columns in ascending or descending order:

nc     <- ncol(d2)                                        # Number of columns in d2.
bycols <- subset(d2, select = c(key1, tail(seq(nc), n)) ) # Select columns by name and index in one line.
bysort <- c(FALSE, rep(TRUE, nc-1) )                      # Decending, asscending ...
idx    <- do.call('order',                                # New sort index.
                  c(bycols,                               # List of sort keys.
                    decreasing=list(bysort),              # Sort direction.
                    method='radix')
                   )
identical(id3, idx)
d6 <- d2[idx, ]; d6

jay.sf · Accepted Answer

Using order with do.call.

> d2[do.call('order', d2), ]
   key1  h ak1 ak2 ak3 ak4 ak5
12  ITA 59   1   2   1   1   1
4   ITA 59   1   2   3   1   1
8   ITA 59   2   1   3   3   3
7   LAT  2   2   1   2   1   1
11  LAT  2   3   1   3   2   3
3   LAT  2   3   2   2   1   3
6   LTU  3   1   2   1   2   2
2   LTU  3   1   2   3   1   1
10  LTU  3   1   3   1   1   2
1   NED 18   1   2   1   2   3
5   NED 18   1   2   3   2   2
9   NED 18   3   3   1   1   1

> d2[do.call('order', c(d2, decreasing=TRUE)), ]
   key1  h ak1 ak2 ak3 ak4 ak5
9   NED 18   3   3   1   1   1
5   NED 18   1   2   3   2   2
1   NED 18   1   2   1   2   3
10  LTU  3   1   3   1   1   2
2   LTU  3   1   2   3   1   1
6   LTU  3   1   2   1   2   2
3   LAT  2   3   2   2   1   3
11  LAT  2   3   1   3   2   3
7   LAT  2   2   1   2   1   1
8   ITA 59   2   1   3   3   3
4   ITA 59   1   2   3   1   1
12  ITA 59   1   2   1   1   1

Try adding decreasing= argument for each column.

> d2[
+   do.call('order', 
+           c(d2, decreasing=list(c(FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE)),
+             method='radix')),
+ ]
   key1  h ak1 ak2 ak3 ak4 ak5
12  ITA 59   1   2   1   1   1
4   ITA 59   1   2   3   1   1
8   ITA 59   2   1   3   3   3
7   LAT  2   2   1   2   1   1
3   LAT  2   3   2   2   1   3
11  LAT  2   3   1   3   2   3
10  LTU  3   1   3   1   1   2
6   LTU  3   1   2   1   2   2
2   LTU  3   1   2   3   1   1
1   NED 18   1   2   1   2   3
5   NED 18   1   2   3   2   2
9   NED 18   3   3   1   1   1

B. Christian Kamgang · Answer

You can run the following code to sort using all columns. Note that the sorting is by reference. Note that this solution is not based on built-in functions though.

data.table::setorderv(d2)

   key1  h ak1 ak2 ak3 ak4 ak5
12  ITA 59   1   2   1   1   1
4   ITA 59   1   2   3   1   1
8   ITA 59   2   1   3   3   3
7   LAT  2   2   1   2   1   1
11  LAT  2   3   1   3   2   3
3   LAT  2   3   2   2   1   3
6   LTU  3   1   2   1   2   2
2   LTU  3   1   2   3   1   1
10  LTU  3   1   3   1   1   2
1   NED 18   1   2   1   2   3
5   NED 18   1   2   3   2   2
9   NED 18   3   3   1   1   1

Note that this function has two arguments to customize how the sorting is done and what columns to use. Argument cols specifies the columns to use (all columns by default) and argument order specifies how each column is used (increasing or decreasing order); In order, 1 is used for increase and -1 for decrease order, and it can have length equal to the number of columns used for sorting. To sort a data.frame by ascending order based on the first column and descending order for the remaining columns, the code below can be used:

data.table::setorderv(d2, order=c(1, rep(-1, length(d2)-1)))

How to sort a data frame (in R base) by all columns, without knowing all the columns beforehand

Tags:

sorting

dataframe

r

clp

2 Answers

jay.sf

B. Christian Kamgang

Recent Activity

Donate For Us

How to sort a data frame (in R base) by all columns, without knowing all the columns beforehand

Tags:

sorting

dataframe

r

clp

2 Answers

jay.sf

B. Christian Kamgang

Related questions

Recent Activity

Donate For Us