Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count combinations by column, order doesn't matter

Tags:

r

combinations

dat <- data.frame(A = c("r","t","y","g","r"),
                  B = c("g","r","r","t","y"),
                  C = c("t","g","t","r","t"))

  A B C
1 r g t
2 t r g
3 y r t
4 g t r
5 r y t

I would like to list the characters that occur together across the three columns, ignoring order. e.g.

Combinations  Freq
r t g         3
y t r         2

If I wanted to add a frequency count of a nominal variable (e.g. gender), how might I do that?

e.g.

dat <- data.frame(A = c("r","t","y","g","r"),
                  B = c("g","r","r","t","y"),
                  C = c("t","g","t","r","t"),
             Gender = c("male", "female", "female", "male", "male"))

dat

  A B C Gender
1 r g t   male
2 t r g female
3 y r t female
4 g t r   male
5 r y t   male

To get this:

Combinations  Freq   Male   Female
r t g         3      2       1
y t r         2      1       1
like image 942
SeekingData Avatar asked Oct 25 '25 22:10

SeekingData


1 Answers

You could do...

data.frame(table(combo = sapply(split(as.matrix(dat), row(dat)), 
  function(x) paste(sort(x), collapse=" "))))

  combo Freq
1 g r t    3
2 r t y    2

For readability, I'd suggest doing it in multiple lines and/or using magrittr:

d = as.matrix(dat)
library(magrittr)

d %>% split(., row(.)) %>% sapply(
  . %>% sort %>% paste(collapse = " ")
) %>% table(combo = .) %>% data.frame

  combo Freq
1 g r t    3
2 r t y    2

Re the edit / new question, I'd take a somewhat different approach, maybe like...

# new example data
dat <- data.frame(A = c("r","t","y","g","r"), B = c("g","r","r","t","y"), C = c("t","g","t","r","t"),Gender = c("male", "female", "female", "male", "male"))

library(data.table)
setDT(dat)

dat[, combo := sapply(transpose(.SD), 
  . %>% sort %>% paste(collapse = " ")), .SDcols=A:C]

dat[, c(
  n = .N, 
  Gender %>% factor(levels=c("male", "female")) %>% table %>% as.list
), by=combo]

   combo n male female
1: g r t 3    2      1
2: r t y 2    1      1
like image 165
Frank Avatar answered Oct 27 '25 10:10

Frank