I'm currently learning the very robust and efficient data.table framework(package). I however can't seem to figure out how to do something like this. What I'm looking to do is group by multiple columns(manufacturer and carier), get the number of flights based on this grouping then arrange these in descending order followed by a ggplot of the top 10 manufacturers and carriers. I would do this in the tidyverse as follows:
library(nycflights13)
library(tidyverse)
flights %>% 
  left_join(planes, by = "tailnum") %>% 
  group_by(manufacturer, carrier) %>% 
  summarise(N = n()) %>% 
  arrange(desc(N)) %>% 
  top_n(10, N) %>% 
  ggplot(aes(carrier, N, fill = manufacturer)) + geom_col() + guides(fill = FALSE)
Here is what I've tried:(I left the question for several minutes to try and solve it but failed)
library(data.table)
fly<-copy(nycflights13::flights)
setDT(fly)
setkey(fly,tailnum)
planes1 <- copy(planes)
setDT(planes1)
setkey(planes1, tailnum)
#head(planes1,2)
Merged <- merge(fly, planes1, by = "tailnum")
#Group by manufacturer
Merged[, .N, by = .(manufacturer,carrier)] #[, order(manufacturer, carrier)]
The problem is I can't get to return the ordered data and also don't know how to "chain" to ggplot without saving the ordered merge as an object first.
You can use the square brackets [ & ] to chain stuff together in data.table. Furthermore, you can execute a ggplot call inside the j part of the data.table syntax:
nms <- setdiff(names(planes1), "tailnum")
fly[planes1, on = .(tailnum), (nms) := mget(nms)
    ][, .N, by = .(manufacturer,carrier)
      ][order(-N)
        ][, .SD[1:10], by = .(manufacturer,carrier)
          ][, ggplot(.SD, aes(carrier, N, fill = manufacturer)) +
              geom_col() +
              guides(fill = FALSE)]
which gives:

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With