df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
I'd like to add a new column to the above dataframe which takes the cumulative mean of the value column, not taking into account NAs. Is it possible to do this with dplyr? I've tried
df <- df %>% group_by(category) %>% mutate(new_col=cummean(value))
but cummean just doesn't know what to do with NAs.
EDIT: I do not want to count NAs as 0.
You could use ifelse to treat NAs as 0 for the cummean call:
library(dplyr)
df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
df %>%
group_by(category) %>%
mutate(new_col = cummean(ifelse(is.na(value), 0, value)))
Output:
# A tibble: 8 x 3
# Groups: category [2]
category value new_col
<fct> <dbl> <dbl>
1 cat1 NA 0.
2 cat1 2. 1.00
3 cat2 3. 3.00
4 cat1 4. 2.00
5 cat2 5. 4.00
6 cat2 NA 2.67
7 cat1 7. 3.25
8 cat2 8. 4.00
EDIT: Now I see this isn't the same as ignoring NAs.
Try this one instead. I group by a column which specifies if the value is NA or not, meaning cummean can run without encountering any NAs:
library(dplyr)
df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
df %>%
group_by(category, isna = is.na(value)) %>%
mutate(new_col = ifelse(isna, NA, cummean(value)))
Output:
# A tibble: 8 x 4
# Groups: category, isna [4]
category value isna new_col
<fct> <dbl> <lgl> <dbl>
1 cat1 NA TRUE NA
2 cat1 2. FALSE 2.00
3 cat2 3. FALSE 3.00
4 cat1 4. FALSE 3.00
5 cat2 5. FALSE 4.00
6 cat2 NA TRUE NA
7 cat1 7. FALSE 4.33
8 cat2 8. FALSE 5.33
An option is to remove value before calculating cummean. In this method rows with NA value will not be accounted for cummean calculation. Not sure if OP wants to consider NA value as 0 in calculation.
df %>% mutate(rn = row_number()) %>%
filter(!is.na(value)) %>%
group_by(category) %>%
mutate(new_col = cummean(value)) %>%
ungroup() %>%
right_join(mutate(df, rn = row_number()), by="rn") %>%
select(category = category.y, value = value.y, new_col) %>%
as.data.frame()
# category value new_col
# 1 cat1 NA NA
# 2 cat1 2 2.000000
# 3 cat2 3 3.000000
# 4 cat1 4 3.000000
# 5 cat2 5 4.000000
# 6 cat2 NA NA
# 7 cat1 7 4.333333
# 8 cat2 8 5.333333
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With