Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select the n most frequent values in a variable

I would like to find the most common values in a column in a data frame. I assume using table would be the best way to do this? I then want to filter/subset my data frame to only include these top-n values.

An example of my data frame is as follows. Here I want to find e.g. the top 2 IDs.

ID    col
A     blue
A     purple
A     green
B     green
B     red
C     red
C     blue
C     yellow
C     orange

I therefore want to output the following:

Top 2 values of ID are:
A and C

I will then select the rows corresponding to ID A and C:

ID    col
A     blue
A     purple
A     green
C     red
C     blue
C     yellow
C     orange
like image 991
Fiona Avatar asked Nov 24 '25 05:11

Fiona


1 Answers

You can try a tidyverse. Add the counts of ID's, then filter for the top two (using < 3) or top ten (using < 11):

library(tidyverse)
d %>% 
  add_count(ID) %>% 
  filter(dense_rank(-n) < 3)
# A tibble: 7 x 3
  ID    col        n
  <fct> <fct>  <int>
1 A     blue       3
2 A     purple     3
3 A     green      3
4 C     red        4
5 C     blue       4
6 C     yellow     4
7 C     orange     4

Data

d <- read.table(text="ID    col
A     blue
                A     purple
                A     green
                B     green
                B     red
                C     red
                C     blue
                C     yellow
                C     orange", header=T)
like image 122
Roman Avatar answered Nov 25 '25 19:11

Roman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!