Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive sync faster than Recursive async

Tags:

f#

How come that Solution 2 is more efficient than Solution 1?

(The time is the average of 100 runs, and the total folders they go through is 13217)

// Solution 1 (2608,9ms)
let rec folderCollector path =
  async { let! dirs = Directory.AsyncGetDirectories path 
          do! [for z in dirs -> folderCollector z] 
              |> Async.Parallel |> Async.Ignore }

// Solution 2 (2510,9ms)
let rec folderCollector path =
  let dirs = Directory.GetDirectories path 
  for z in dirs do folderCollector z

I would have thought that Solution 1 would be faster because it's async, and that I run it in Parallel. What am I'm missing?

like image 371
ebb Avatar asked Jun 06 '26 04:06

ebb


2 Answers

As Daniel and Brian already clearly explained, your solution is probably creating too many short-lived asynchronous computations (so the overhead is more than the gains from parallelism). The AsyncGetDirectories operation also probably isn't really non-blocking as it is not doing much work. I don't see a truly async version of this operation anywhere - how is it defined?

Anyway, using the ordinary GetDirectories, I tried the following version (which creates only a small number of parallel asyncs):

// Synchronous version
let rec folderCollectorSync path =
    let dirs = Directory.GetDirectories path 
    for z in dirs do folderCollectorSync z

// Asynchronous version that uses synchronous when 'nesting <= 0'
let rec folderCollector path nesting =
    async { if nesting <= 0 then return folderCollectorSync path 
            else let dirs = Directory.GetDirectories path 
                 do! [for z in dirs -> folderCollector z (nesting - 1) ] 
                     |> Async.Parallel |> Async.Ignore }

Calling a simple synchronous version after certain number of recursive calls is a common trick - it is used when parallelizing any tree-like structure that is very deep. Using folderCollector path 2, this will start only tens of parallel tasks (as opposed to thousands), so it will be more efficient.

On a sample directory I used (with 4800 sub-dirs and 27000 files), I get:

  • folderCollectorSync path takes 1 second
  • folderCollector path 2 takes takes 600ms (result is similar for any nesting between 1 and 4)
like image 193
Tomas Petricek Avatar answered Jun 08 '26 00:06

Tomas Petricek


From the comments:

Your function incurs the cost of async without any of the benefits because

  1. you're creating too many asyncs for the short amount of work to be done
  2. your function is not CPU, but rather IO, bound
like image 23
Daniel Avatar answered Jun 08 '26 00:06

Daniel