Why in the following Julia code the parallel implementation runs slower than the serial?
using Distributed
@everywhere function ext(i::Int64)
   callmop = `awk '{ sum += $1 } END { print sum }' infile_$(i)`
   run(callmop)
end
function fpar()
   @sync @distributed for i = 1:10
      ext(i)
   end
end
function fnopar()
   for i = 1:10
      ext(i)
   end
end
val, t_par, bytes, gctime, memallocs = @timed fpar()
val, t_nopar, bytes, gctime, memallocs = @timed fnopar()
println("Parallel: $(t_par) s. Serial: $(t_nopar) s")  
# Parallel: 0.448290379 s. Serial: 0.028704802 s
The files infile_$(i) contain a single column of real numbers. After some research I bumped into this post and this other post) that deal with similar problems. They seem a bit dated though, if one considers the speed at which Julia is been developed. Is there any way to improve this parallel section? Thank you very much in advance.
Your code is correct but you measure the performance incorrectly.
Note that for this use case scenario (calling external processes) you should be fine with green threads - no need to distribute the load at all!
When a Julia function is executed for the first time it is being compiled. When you execute it on several parallel process all of them need to compile the same piece of code.
On top of that the first @distribution macro run also takes a long time to compile. 
Hence before using @timed you should call once both the fpar and nofpar functions.
Last but not least, there is no addprocs in your code but I assume that you have used -p Julia option to add the worker processes to your Julia master process. By the way you did not mention how many of the worker processes you have.
I usually test code like this:
@time fpar()
@time fpar()
@time fnopar()
@time fnopar()
The first measure is to understand the compile time and the second measure to understand the running time.
It is also worth having a look at the BenchmarkTools package and the @btime macro.
Regarding performance tests @distributed has a significant communication overhead. In some scenarios this can be mitigated by using SharedArrays in others by using Thread.@threads. However in your case the fastest code would be the one using green threads:
function ffast()
   @sync for i = 1:10
      @async ext(i)
   end
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With