I have the following tibble and want to replace the NA values with the sum of the "children" (e.g. the value for "c" is equal to the sum of "d" and "e". The value for "a" is then the sum of "b" and "c". So the problem is how to summarise and keep at the same time the original values.
mydata <- tibble(id = c("a", "b", "c", "d", "e"),
value = c(NA, 1, NA, 2, 3),
parent = c(NA, "a", "a", "c", "c"),
level = c(1,2,2,3,3))
# A tibble: 5 x 4
id value parent level
<chr> <dbl> <chr> <dbl>
1 a NA NA 1
2 b 1 a 2
3 c NA a 2
4 d 2 c 3
5 e 3 c 3
Final result should be
id value parent level
<chr> <dbl> <chr> <dbl>
1 a 6 NA 1
2 b 1 a 2
3 c 5 a 2
4 d 2 c 3
5 e 3 c 3
I have tried several approaches but the only one that work is lengthy and rather clumsy. I have the feeling there should be an easy solution in tidyverse. Any ideas ? (in a loop? the original problem has 5 levels).
Cheers
Renger
You can use a while() loop to calculate value iteratively.
library(dplyr)
mydata %>%
mutate(value = {
while(anyNA(value)) {
sub_id <- id[is.na(value)]
ind <- parent %in% sub_id
value[is.na(value)] <- tapply(value[ind], parent[ind], sum)[sub_id]
}
value
})
# # A tibble: 5 × 4
# id value parent level
# <chr> <dbl> <chr> <dbl>
# 1 a 6 NA 1
# 2 b 1 a 2
# 3 c 5 a 2
# 4 d 2 c 3
# 5 e 3 c 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With