How come that Solution 2 is more efficient than Solution 1?
(The time is the average of 100 runs, and the total folders they go through is 13217)
// Solution 1 (2608,9ms)
let rec folderCollector path =
async { let! dirs = Directory.AsyncGetDirectories path
do! [for z in dirs -> folderCollector z]
|> Async.Parallel |> Async.Ignore }
// Solution 2 (2510,9ms)
let rec folderCollector path =
let dirs = Directory.GetDirectories path
for z in dirs do folderCollector z
I would have thought that Solution 1 would be faster because it's async, and that I run it in Parallel. What am I'm missing?
As Daniel and Brian already clearly explained, your solution is probably creating too many short-lived asynchronous computations (so the overhead is more than the gains from parallelism). The AsyncGetDirectories operation also probably isn't really non-blocking as it is not doing much work. I don't see a truly async version of this operation anywhere - how is it defined?
Anyway, using the ordinary GetDirectories, I tried the following version (which creates only a small number of parallel asyncs):
// Synchronous version
let rec folderCollectorSync path =
let dirs = Directory.GetDirectories path
for z in dirs do folderCollectorSync z
// Asynchronous version that uses synchronous when 'nesting <= 0'
let rec folderCollector path nesting =
async { if nesting <= 0 then return folderCollectorSync path
else let dirs = Directory.GetDirectories path
do! [for z in dirs -> folderCollector z (nesting - 1) ]
|> Async.Parallel |> Async.Ignore }
Calling a simple synchronous version after certain number of recursive calls is a common trick - it is used when parallelizing any tree-like structure that is very deep. Using folderCollector path 2, this will start only tens of parallel tasks (as opposed to thousands), so it will be more efficient.
On a sample directory I used (with 4800 sub-dirs and 27000 files), I get:
folderCollectorSync path takes 1 secondfolderCollector path 2 takes takes 600ms (result is similar for any nesting between 1 and 4)From the comments:
Your function incurs the cost of async without any of the benefits because
asyncs for the short amount of work to be doneIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With