Is there a way to configure furrr::future_map that would allow a nested use case ? Consider the following code :
library(furrr)
library(tictoc)
# The problem is easier to reason about if you take N
# smaller than your number of cores, and M big.
N = 2
M = 100
plan(sequential)
tic()
x = future_map(1:N, function(i){
furrr::future_map(1:M,function(j){
Sys.sleep(1/M)
return(1)
})
})
toc() # 2sec + overhead
plan(multiprocess)
tic()
x = future_map(1:N, function(i){
furrr::future_map(1:M,function(j){
Sys.sleep(1/M)
return(1)
})
})
toc() # one sec + overhead !!
The first one should take a little more than 2sec. This is OK. But, even on a thousand-cores machine, is there a way to make the second one take less than 1sec ?
My use case is the following : some sub-tasks take a longer time than others to complete, and when some are finished, some cores are free to further disptach the longer tasks.
But furrr does not do that by default, and lnger-running tasks end up on only one core. The problem is equivalent to the one displayed on the above code : is there a way to have furrr re-dispatch inner tasks if some cores are free ?
Is it just unpossible to do, or did i miss a parameter to furrr/future calls ?
Edit: Thanks to the comment from henrikb change multiprocess to multisession because of deprication since Juli 2023.
In A Future for R: Future Topologies mentioned by Axeman you can use the future::tweak in future::plan. There the elements of the list show the depth. So if you provide two plans the parallelization also runs in your nested furrr::future_map e.g.:
future::plan(
list(
future::tweak(
future::multisession,
workers = 2),
future::tweak(
future::multisession,
workers = 4)
)
)
The example works with 8 cores since every of the two first workers gets 4 additional workers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With