Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort a data frame (in R base) by all columns, without knowing all the columns beforehand

I have a data frame with a number of fixed columns, followed by a variable number of columns, depending on the input.

dd <- data.frame(key1=c("NED", "LTU", "LAT", "ITA"), h=c(18, 3, 2, 59))      # Two columns.
dd <- rbind(dd, dd, dd)                                                      # Example dataframe.

# Append a random number of auxiliary columns, ak1, ak2, ...
set.seed(2024); n <- sample(4:8, size=1)                                     # Number of aux. columns.
akeys <- matrix(sample(x=c(1,2,3), size=nrow(dd) * n, replace=TRUE),
                ncol=n
               )                                                             # With random values 1..3.
colnames(akeys) <- paste0("ak", seq(n))                                      # Column names: ak1, ak2, ...

d2 <- cbind(dd, akeys)                                                       # Append n columns for sorting.
                                                                             # Assuming n = 5.
id3 <- order(d2$key1, d2$h, -d2$ak1, -d2$ak2, -d2$ak3, -d2$ak4, -d2$ak5)     # Sort all columns by name.
id4 <- order(d2[,1], d2[,2], -d2[,3], -d2[,4], -d2[,5], -d2[,6], -d2[, 7])   # Sort by column number.

Question: How to sort, in decreasing or increasing order, the data frame by (a selection of) all columns without knowing all the columns beforehand.

For example:

  • sort the data frame by all columns.
  • sort the data frame by key1 (ascending), and then by the last n columns (descending).
# Output d3
d2[id3, ]
#    key1  h ak1 ak2 ak3 ak4 ak5
# 8   ITA 59   2   1   3   3   3
# 4   ITA 59   1   2   3   1   1
# 12  ITA 59   1   2   1   1   1
# 3   LAT  2   3   2   2   1   3
# 11  LAT  2   3   1   3   2   3
# 7   LAT  2   2   1   2   1   1
# 10  LTU  3   1   3   1   1   2
# 2   LTU  3   1   2   3   1   1
# 6   LTU  3   1   2   1   2   2
# 9   NED 18   3   3   1   1   1
# 5   NED 18   1   2   3   2   2
# 1   NED 18   1   2   1   2   3

Edit (@jay.sf). Sort data frame by (a selection of) all columns in ascending or descending order:

nc     <- ncol(d2)                                        # Number of columns in d2.
bycols <- subset(d2, select = c(key1, tail(seq(nc), n)) ) # Select columns by name and index in one line.
bysort <- c(FALSE, rep(TRUE, nc-1) )                      # Decending, asscending ...
idx    <- do.call('order',                                # New sort index.
                  c(bycols,                               # List of sort keys.
                    decreasing=list(bysort),              # Sort direction.
                    method='radix')
                   )
identical(id3, idx)
d6 <- d2[idx, ]; d6

like image 795
clp Avatar asked Dec 08 '25 08:12

clp


2 Answers

Using order with do.call.

> d2[do.call('order', d2), ]
   key1  h ak1 ak2 ak3 ak4 ak5
12  ITA 59   1   2   1   1   1
4   ITA 59   1   2   3   1   1
8   ITA 59   2   1   3   3   3
7   LAT  2   2   1   2   1   1
11  LAT  2   3   1   3   2   3
3   LAT  2   3   2   2   1   3
6   LTU  3   1   2   1   2   2
2   LTU  3   1   2   3   1   1
10  LTU  3   1   3   1   1   2
1   NED 18   1   2   1   2   3
5   NED 18   1   2   3   2   2
9   NED 18   3   3   1   1   1

> d2[do.call('order', c(d2, decreasing=TRUE)), ]
   key1  h ak1 ak2 ak3 ak4 ak5
9   NED 18   3   3   1   1   1
5   NED 18   1   2   3   2   2
1   NED 18   1   2   1   2   3
10  LTU  3   1   3   1   1   2
2   LTU  3   1   2   3   1   1
6   LTU  3   1   2   1   2   2
3   LAT  2   3   2   2   1   3
11  LAT  2   3   1   3   2   3
7   LAT  2   2   1   2   1   1
8   ITA 59   2   1   3   3   3
4   ITA 59   1   2   3   1   1
12  ITA 59   1   2   1   1   1

Try adding decreasing= argument for each column.

> d2[
+   do.call('order', 
+           c(d2, decreasing=list(c(FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE)),
+             method='radix')),
+ ]
   key1  h ak1 ak2 ak3 ak4 ak5
12  ITA 59   1   2   1   1   1
4   ITA 59   1   2   3   1   1
8   ITA 59   2   1   3   3   3
7   LAT  2   2   1   2   1   1
3   LAT  2   3   2   2   1   3
11  LAT  2   3   1   3   2   3
10  LTU  3   1   3   1   1   2
6   LTU  3   1   2   1   2   2
2   LTU  3   1   2   3   1   1
1   NED 18   1   2   1   2   3
5   NED 18   1   2   3   2   2
9   NED 18   3   3   1   1   1
like image 78
jay.sf Avatar answered Dec 10 '25 00:12

jay.sf


You can run the following code to sort using all columns. Note that the sorting is by reference. Note that this solution is not based on built-in functions though.

data.table::setorderv(d2)

   key1  h ak1 ak2 ak3 ak4 ak5
12  ITA 59   1   2   1   1   1
4   ITA 59   1   2   3   1   1
8   ITA 59   2   1   3   3   3
7   LAT  2   2   1   2   1   1
11  LAT  2   3   1   3   2   3
3   LAT  2   3   2   2   1   3
6   LTU  3   1   2   1   2   2
2   LTU  3   1   2   3   1   1
10  LTU  3   1   3   1   1   2
1   NED 18   1   2   1   2   3
5   NED 18   1   2   3   2   2
9   NED 18   3   3   1   1   1

Note that this function has two arguments to customize how the sorting is done and what columns to use. Argument cols specifies the columns to use (all columns by default) and argument order specifies how each column is used (increasing or decreasing order); In order, 1 is used for increase and -1 for decrease order, and it can have length equal to the number of columns used for sorting. To sort a data.frame by ascending order based on the first column and descending order for the remaining columns, the code below can be used:

data.table::setorderv(d2, order=c(1, rep(-1, length(d2)-1)))
like image 24
B. Christian Kamgang Avatar answered Dec 10 '25 00:12

B. Christian Kamgang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!