Combine result from top_n with an "Other" category in dplyr

Question

I have a data frame dat1

   Country Count
1      AUS     1
2       NZ     2
3       NZ     1
4      USA     3
5      AUS     1
6      IND     2
7      AUS     4
8      USA     2
9      JPN     5
10      CN     2

First I want to sum "Count" per "Country". Then the top 3 total counts per country should be combined with an additional row "Others", which is the sum of countries which are not part of top 3.

The expected outcome therefore would be:

    Country Count
1     AUS     6
2     JPN     5
3     USA     5
4     Others  7

I have tried the below code, but could not figure out how to place the "Others" row.

dat1 %>%
    group_by(Country) %>%
    summarise(Count = sum(Count)) %>%
    arrange(desc(Count)) %>%
    top_n(3)

This code currently gives:

    Country Count
1     AUS     6
2     JPN     5
3     USA     5

Any help would be greatly appreciated.

dat1 <- structure(list(Country = structure(c(1L, 5L, 5L, 6L, 1L, 3L, 
    1L, 6L, 4L, 2L), .Label = c("AUS", "CN", "IND", "JPN", "NZ", 
    "USA"), class = "factor"), Count = c(1L, 2L, 1L, 3L, 1L, 2L, 
    4L, 2L, 5L, 2L)), .Names = c("Country", "Count"), class = "data.frame",     row.names = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "10"))

Henrik · Accepted Answer

Instead of top_n, this seems like a good case for the convenience function tally. It uses summarise, sum and arrange under the hood.

Then use factor to create an "Other" category. Use the levels argument to set "Other" as the last level. "Other" will then will be placed last in the table (and in any subsequent plot of the result).

If "Country" is factor in your original data, you may wrap Country[1:3] in as.character.

group_by(df, Country) %>%
  tally(Count, sort = TRUE) %>%
  group_by(Country = factor(c(Country[1:3], rep("Other", n() - 3)),
                            levels = c(Country[1:3], "Other"))) %>%
  tally(n) 

#  Country     n
#   (fctr) (int)
#1     AUS     6
#2     JPN     5
#3     USA     5
#4   Other     7

deradelo · Answer

You can use fct_lump from the forcats library

dat1 %>%
  group_by(fct_lump(Country, n = 3, w = Count)) %>%
  summarize(Count = sum(Count))

This should do it, also you can change the "Other" label using the other_level param inside fct_lump

Combine result from top_n with an "Other" category in dplyr

Tags:

r

dplyr

abhy3

2 Answers

Henrik

deradelo

Recent Activity

Donate For Us

Combine result from top_n with an "Other" category in dplyr

Tags:

r

dplyr

abhy3

2 Answers

Henrik

deradelo

Related questions

Recent Activity

Donate For Us