dplyr is fast and I would like to use the %.% piping a lot. I want to use a table function (count by frequency) and preserve column name and have output be data.frame.
How can I achieve the same as the code below using only dplyr functions (imagine huge data.table (BIGiris) with 6M rows)
> out<-as.data.frame(table(iris$Species))
> names(out)[1]<-'Species'
> names(out)[2]<-'my_cnt1'
> out
output is this. Notice that I have to rename back column 1. Also, in dplyr mutate or other call - I would like to specify name for my new count column somehow.
Species my_cnt1
1 setosa 50
2 versicolor 50
3 virginica 50
imagine joining to a table like this (assume iris data.frame has 6M rows) and species is more like "species_ID"
> habitat<-data.frame(Species=c('setosa'),lives_in='sea')
final join and output (for joining, I need to preserve column names all the time)
> left_join(out,habitat)
Joining by: "Species"
Species my_cnt1 lives_in
1 setosa 50 sea
2 versicolor 50 <NA>
3 virginica 50 <NA>
>
For the first part you can use dplyr
like this
library(dplyr)
out <- iris %>% group_by(Species) %>% summarise(my_cnt1 = n())
out
Source: local data frame [3 x 2]
Species my_cnt1
1 setosa 50
2 versicolor 50
3 virginica 50
To continue in one chain do this:
out <- iris %>% group_by(Species) %>% summarise(my_cnt1 = n()) %>% left_join(habitat)
out
Source: local data frame [3 x 3]
Species my_cnt1 lives_in
1 setosa 50 sea
2 versicolor 50 NA
3 virginica 50 NA
By the way, dplyr
now uses %>%
in place of %.%
. It does the same thing and is part of the package magrittr
as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With