Coming from Stata I am sometimes still struggling with the different programming approach of R. In particular when it comes to avoiding for loops.
In the example below, I have written two functions which overwrite the original values of ex$status1' and ex$status2'. For every id the original values of the two variables should be replaced with x if there is any occurrence of xwithin the respective id.
The function myfunc2 is perfectly capable of performing this task for several variables (in the example below: status1 and status2).
My problem, however, occurs when trying to impose a sequential order of replacing the initial values. The order is given as c(1,5,3,7). That is, if value 1 is observed for a given id all the values of the variable for this id should be replaced by 1. Then the procedure should be repeated on the updated data for the remaining values of c(1,5,3,7). I accomplished this task with a for-loop, but failed to do it using one of purrr's map functions because these functions always were executed on the original tibble and did not update the tibble sequentially (see code below). Can anyone show me, how to obtain the desired result with a map function (or simply without using the for loop)?
ex <- tibble(id = c(1,1,1,1,2,2,2),
status1 = c(3,3,5,7,1,5,7),
status2 = c(3,3,3,7,7,5,7))
ex
myfunc <- function(df, id, var, val) {
df <- df %>%
group_by(id) %>%
mutate({{var}} := case_when(any({{var}} == val) ~ val,
TRUE ~ {{var}})) %>%
ungroup() %>%
select({{var}})
return(df)
}
myfunc(ex, id, status1, 1)
myfunc2 <- function(df, id, var, val) {
map_dfc(var,
~myfunc(df, id, !!sym(.x), val)) %>%
add_column(id = df$id, .before = 1)
}
myfunc2(ex, id, c("status1", "status2"), 1)
# this works
for (i in c(1,5,3,7)) {
ex <- myfunc2(ex, id, c("status1", "status2"), i)
}
# this does not work
c(1,5,3,7) %>%
map_dfc(function(x) {ex <- myfunc2(ex, id, c("status1", "status2"), x)})
# original data
# A tibble: 7 x 3
id status1 status2
<dbl> <dbl> <dbl>
1 1 3 3
2 1 3 3
3 1 5 3
4 1 7 7
5 2 1 7
6 2 5 5
7 2 7 7
# Data after executing the for-loop
# A tibble: 7 x 3
id status1 status2
<dbl> <dbl> <dbl>
1 1 5 3
2 1 5 3
3 1 5 3
4 1 5 3
5 2 1 5
6 2 1 5
7 2 1 5
lapply, map loops on each of the input elements and return the output but it won't update the original object recursively as in a for loop. If we want to do that, then have to do a scoping update with <<- which may not be the best option. Would recommend the for loop
library(dplyr)
library(purrr)
c(1,5,3,7) %>%
map_dfc(function(x) {
ex <<- myfunc2(ex, id, c("status1", "status2"), x)
})
Now, we check the object 'ex'
ex
# A tibble: 7 x 3
# id status1 status2
# <dbl> <dbl> <dbl>
#1 1 5 3
#2 1 5 3
#3 1 5 3
#4 1 5 3
#5 2 1 5
#6 2 1 5
#7 2 1 5
With tidyverse, we could use reduce to do this instead of map and <<-
reduce(list(1, 5, 3, 7),
~myfunc2(.x, id, c("status1", "status2"), .y), .init = ex)
# A tibble: 7 x 3
# id status1 status2
# <dbl> <dbl> <dbl>
#1 1 5 3
#2 1 5 3
#3 1 5 3
#4 1 5 3
#5 2 1 5
#6 2 1 5
#7 2 1 5
which is similar to the base R Reduce
Reduce(function(x, y) myfunc2(x, id, c("status1", "status2"), y),
list(1, 5, 3, 7), init = ex)
# A tibble: 7 x 3
# id status1 status2
# <dbl> <dbl> <dbl>
#1 1 5 3
#2 1 5 3
#3 1 5 3
#4 1 5 3
#5 2 1 5
#6 2 1 5
#7 2 1 5
One advantage with these approaches is to avoid the side-effect i.e. we don't have to update the original object
ex
# A tibble: 7 x 3
# id status1 status2
# <dbl> <dbl> <dbl>
#1 1 3 3
#2 1 3 3
#3 1 5 3
#4 1 7 7
#5 2 1 7
#6 2 5 5
However, considering the simplicity of for loop (in understanding and executing), it may be better with for loop (subjective opinion)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With