I am trying to use a data.frame twice in a dplyr chain. Here is a simple example that gives an error
df <- data.frame(Value=1:10,Type=rep(c("A","B"),5))
df %>%
group_by(Type) %>%
summarize(X=n()) %>%
mutate(df %>%filter(Value>2) %>%
group_by(Type) %>%
summarize(Y=sum(Value)))
Error: cannot handle
So the idea is that first a data.frame is created with two columns Value which is just some data and Type which indicates which group the value is from.
I then try to use summarize to get the number of objects in each group, and then mutate, using the object again to get the sum of the values, after the data has been filtered. However I get the Error: cannot handle. Any ideas what is happening here?
Desired Output:
Type X Y
A 5 24
B 5 28
You could try the following
df %>%
group_by(Type) %>%
summarise(X = n(), Y = sum(Value[Value > 2]))
# Source: local data frame [2 x 3]
#
# Type X Y
# 1 A 5 24
# 2 B 5 28
The idea is to filter only Value by the desired condition, instead the whole data set
And a bonus solution
library(data.table)
setDT(df)[, .(X = .N, Y = sum(Value[Value > 2])), by = Type]
# Type X Y
# 1: A 5 24
# 2: B 5 28
Was going to suggest that to @nongkrong but he deleted, with base R we could also do
aggregate(Value ~ Type, df, function(x) c(length(x), sum(x[x>2])))
# Type Value.1 Value.2
# 1 A 5 24
# 2 B 5 28
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With