Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarising and keeping the original values

I have the following tibble and want to replace the NA values with the sum of the "children" (e.g. the value for "c" is equal to the sum of "d" and "e". The value for "a" is then the sum of "b" and "c". So the problem is how to summarise and keep at the same time the original values.

mydata <- tibble(id = c("a", "b", "c", "d", "e"), 
                 value = c(NA, 1, NA, 2, 3),
                 parent = c(NA, "a", "a", "c", "c"), 
                 level = c(1,2,2,3,3))

# A tibble: 5 x 4
  id    value parent level
  <chr> <dbl> <chr>  <dbl>
1 a        NA NA         1
2 b         1 a          2
3 c        NA a          2
4 d         2 c          3
5 e         3 c          3

Final result should be

  id    value parent level
  <chr> <dbl> <chr>  <dbl>
1 a         6 NA         1
2 b         1 a          2
3 c         5 a          2
4 d         2 c          3
5 e         3 c          3

I have tried several approaches but the only one that work is lengthy and rather clumsy. I have the feeling there should be an easy solution in tidyverse. Any ideas ? (in a loop? the original problem has 5 levels).

Cheers

Renger

like image 908
arnyeinstein Avatar asked Nov 01 '25 23:11

arnyeinstein


1 Answers

You can use a while() loop to calculate value iteratively.

library(dplyr)

mydata %>%
  mutate(value = {
    while(anyNA(value)) {
      sub_id <- id[is.na(value)]
      ind <- parent %in% sub_id
      value[is.na(value)] <- tapply(value[ind], parent[ind], sum)[sub_id]
    }
    value
  })

# # A tibble: 5 × 4
#   id    value parent level
#   <chr> <dbl> <chr>  <dbl>
# 1 a         6 NA         1
# 2 b         1 a          2
# 3 c         5 a          2
# 4 d         2 c          3
# 5 e         3 c          3
like image 71
Darren Tsai Avatar answered Nov 03 '25 17:11

Darren Tsai



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!