df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
I'd like to add a new column to the above dataframe which takes the cumulative mean of the value
column, not taking into account NAs. Is it possible to do this with dplyr
? I've tried
df <- df %>% group_by(category) %>% mutate(new_col=cummean(value))
but cummean
just doesn't know what to do with NAs.
EDIT: I do not want to count NAs as 0.
You could use ifelse
to treat NA
s as 0
for the cummean
call:
library(dplyr)
df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
df %>%
group_by(category) %>%
mutate(new_col = cummean(ifelse(is.na(value), 0, value)))
Output:
# A tibble: 8 x 3
# Groups: category [2]
category value new_col
<fct> <dbl> <dbl>
1 cat1 NA 0.
2 cat1 2. 1.00
3 cat2 3. 3.00
4 cat1 4. 2.00
5 cat2 5. 4.00
6 cat2 NA 2.67
7 cat1 7. 3.25
8 cat2 8. 4.00
EDIT: Now I see this isn't the same as ignoring NAs.
Try this one instead. I group by a column which specifies if the value is NA
or not, meaning cummean
can run without encountering any NAs:
library(dplyr)
df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
df %>%
group_by(category, isna = is.na(value)) %>%
mutate(new_col = ifelse(isna, NA, cummean(value)))
Output:
# A tibble: 8 x 4
# Groups: category, isna [4]
category value isna new_col
<fct> <dbl> <lgl> <dbl>
1 cat1 NA TRUE NA
2 cat1 2. FALSE 2.00
3 cat2 3. FALSE 3.00
4 cat1 4. FALSE 3.00
5 cat2 5. FALSE 4.00
6 cat2 NA TRUE NA
7 cat1 7. FALSE 4.33
8 cat2 8. FALSE 5.33
An option is to remove value before calculating cummean
. In this method rows with NA
value will not be accounted for cummean
calculation. Not sure if OP wants to consider NA
value as 0
in calculation.
df %>% mutate(rn = row_number()) %>%
filter(!is.na(value)) %>%
group_by(category) %>%
mutate(new_col = cummean(value)) %>%
ungroup() %>%
right_join(mutate(df, rn = row_number()), by="rn") %>%
select(category = category.y, value = value.y, new_col) %>%
as.data.frame()
# category value new_col
# 1 cat1 NA NA
# 2 cat1 2 2.000000
# 3 cat2 3 3.000000
# 4 cat1 4 3.000000
# 5 cat2 5 4.000000
# 6 cat2 NA NA
# 7 cat1 7 4.333333
# 8 cat2 8 5.333333
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With