I have a data frame dat1
Country Count
1 AUS 1
2 NZ 2
3 NZ 1
4 USA 3
5 AUS 1
6 IND 2
7 AUS 4
8 USA 2
9 JPN 5
10 CN 2
First I want to sum "Count" per "Country". Then the top 3 total counts per country should be combined with an additional row "Others", which is the sum of countries which are not part of top 3.
The expected outcome therefore would be:
Country Count
1 AUS 6
2 JPN 5
3 USA 5
4 Others 7
I have tried the below code, but could not figure out how to place the "Others" row.
dat1 %>%
group_by(Country) %>%
summarise(Count = sum(Count)) %>%
arrange(desc(Count)) %>%
top_n(3)
This code currently gives:
Country Count
1 AUS 6
2 JPN 5
3 USA 5
Any help would be greatly appreciated.
dat1 <- structure(list(Country = structure(c(1L, 5L, 5L, 6L, 1L, 3L,
1L, 6L, 4L, 2L), .Label = c("AUS", "CN", "IND", "JPN", "NZ",
"USA"), class = "factor"), Count = c(1L, 2L, 1L, 3L, 1L, 2L,
4L, 2L, 5L, 2L)), .Names = c("Country", "Count"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
Instead of top_n, this seems like a good case for the convenience function tally. It uses summarise, sum and arrange under the hood.
Then use factor to create an "Other" category. Use the levels argument to set "Other" as the last level. "Other" will then will be placed last in the table (and in any subsequent plot of the result).
If "Country" is factor in your original data, you may wrap Country[1:3] in as.character.
group_by(df, Country) %>%
tally(Count, sort = TRUE) %>%
group_by(Country = factor(c(Country[1:3], rep("Other", n() - 3)),
levels = c(Country[1:3], "Other"))) %>%
tally(n)
# Country n
# (fctr) (int)
#1 AUS 6
#2 JPN 5
#3 USA 5
#4 Other 7
You can use fct_lump from the forcats library
dat1 %>%
group_by(fct_lump(Country, n = 3, w = Count)) %>%
summarize(Count = sum(Count))
This should do it, also you can change the "Other" label using the other_level param inside fct_lump
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With