Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An R tidyverse way to select columns in order based on data type

Tags:

dataframe

r

dplyr

 library(tidyverse)
 #> Warning: package 'tidyverse' was built under R version 3.4.4
 #> Warning: package 'forcats' was built under R version 3.4.4

 example <- tibble(
     num1 = sample(1:100, 10),
     categ1 = as.factor(c(sample(letters, 10))),
     num2 = sample(1:100, 10),
     categ2 = as.factor(c(sample(letters, 10)))
 )

 head(example)
 #> # A tibble: 6 x 4
 #>    num1 categ1  num2 categ2
 #>   <int> <fct>  <int> <fct> 
 #> 1     4 c          5 l     
 #> 2    86 u         64 b     
 #> 3    38 z         18 r     
 #> 4    95 e         44 j     
 #> 5    77 w         35 u     
 #> 6    84 y         14 i

Created on 2018-06-19 by the reprex package (v0.2.0).

The above example shows a basic dataframe with integer and factor data type columns. In this small example, it is easy to use the select(example, categ1, categ2, num1, num2) in dplyr to manually pick the order you want columns to appear.

But suppose you have many columns that are a mixture of data types, and that you want all of the factors to be selected first followed by everything else (or any particular order based on data type)?

Manually typing out each column name or using select() helpers like contains() can become tedious quickly with an innumerable amount columns. I prefer a tidyverse solution, but would also be interested how this could be accomplished in base R.

like image 617
Josh Goldberg Avatar asked Oct 24 '25 10:10

Josh Goldberg


2 Answers

Example data with columns of 3 classes

library(tidyverse)
example <- tibble(
     num1 = as.character(sample(1:100, 10)),
     categ1 = as.factor(c(sample(letters, 10))),
     num2 = sample(1:100, 10),
     categ2 = as.factor(c(sample(letters, 10)))
 )

Lets say you want to order the columns in this order

my.order <- c('factor', 'integer', 'character')

i.e. factors, then integers, then characters

You can do

example %>% 
  select(sapply(., class) %>% .[order(match(., my.order))] %>% names)

# # A tibble: 10 x 4
#    categ1 categ2  num2 num1 
#    <fct>  <fct>  <int> <chr>
#  1 y      e         94 46   
#  2 t      b         52 31   
#  3 w      c         32 57   
#  4 k      i         27 89   
#  5 n      d         76 14   
#  6 x      g         67 40   
#  7 c      v         16 20   
#  8 e      z          6 95   
#  9 i      t         70 13   
# 10 g      w         57 42 

As a function (same output)

order_cols <- function(df, col.order){
  df %>% 
    select(sapply(., class) %>% .[order(match(., col.order))] %>% names)
}

example %>% 
  order_cols(c('factor', 'integer', 'character'))
like image 70
IceCreamToucan Avatar answered Oct 25 '25 23:10

IceCreamToucan


Posting a tidyverse approach that I figured out.

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.4.4
#> Warning: package 'forcats' was built under R version 3.4.4

example <- tibble(
  num1 = sample(1:100, 10),
  categ1 = as.factor(c(sample(letters, 10))),
  num2 = sample(1:100, 10),
  categ2 = as.factor(c(sample(letters, 10)))
  )

head(example)
#> # A tibble: 6 x 4
#>    num1 categ1  num2 categ2
#>   <int> <fct>  <int> <fct> 
#> 1    33 h         94 s     
#> 2    78 x          6 k     
#> 3    82 s         84 i     
#> 4    11 k         20 o     
#> 5    51 v         11 q     
#> 6     5 w         51 b

# Use select_if() to specify data-type and pull names to insert into outter select()
# Intersect is only needed if you previously filtered 
# some columns and you do not want those factors (in this case) to creep back in 
# with the select_if() call 
example_arranged <- example %>%
  select(intersect(names(select_if(., is.factor)), names(.)), everything())

head(example_arranged)
#> # A tibble: 6 x 4
#>   categ1 categ2  num1  num2
#>   <fct>  <fct>  <int> <int>
#> 1 h      s         33    94
#> 2 x      k         78     6
#> 3 s      i         82    84
#> 4 k      o         11    20
#> 5 v      q         51    11
#> 6 w      b          5    51

Created on 2018-06-19 by the reprex package (v0.2.0).

like image 40
Josh Goldberg Avatar answered Oct 26 '25 01:10

Josh Goldberg