Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

group_by, get most frequent and second most frequent

I have the following dataset:

a   b

1   a           
1   a           
1   a           
1   none            
2   none            
2   none            
2   b           
3   a           
3   c           
3   c
3   d
4   a

I want to get the most frequent value in b for any a and the second most frequent value of b for any a. in case two values in b have the same frequency I m indifferent about any of the two being considered the "first" or the "second".

in this case the expected output would be:

d2:

a    first    second 
1    a        none
2    none     b
3    c        a(or d, doesn't matter)
4    a        NA

as you can see a=4 has just one value in b, thus I expect a NA in the output column "second" as there is no second most frequent value.

data:

a <- c(1,1,1,1,2,2,2,3,3,3,3,4)
b<- c("a","a", "a", "none", "none", "none", "b", "a", "c" , "c",  "d","a")  
d <-  data.frame(a,b)

what I tried at the moment is the following

d1 <-  d %>% group_by(a) %>% summarize ( first =names(which.max(table(b)))  , second= names(which.max(table(b)[-which.max(table(b))] )))

but it doesn't work properly, any idea on how to do this?

like image 882
Carbo Avatar asked Nov 16 '25 03:11

Carbo


1 Answers

You can count number of rows for a and b combination and for each value of a select 1st and 2nd value in summarise.

library(dplyr)
 d %>%
  count(a, b, sort = TRUE) %>%
  group_by(a) %>%
  summarise(first = b[1],second = b[2])


# A tibble: 4 x 3
#      a first second
#  <dbl> <chr> <chr> 
#1     1 a     none  
#2     2 none  b     
#3     3 c     a     
#4     4 a     NA    
like image 76
Ronak Shah Avatar answered Nov 17 '25 16:11

Ronak Shah



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!