I am learning Erlang and am quite impressed how easy it is to parallelize work. To practice a bit I dug up the good old Fibanocci sequence. In the following code I try to take advantage of parallelization by computing the expensive products three at a time.
-module (fib4).
-export ( [main/1] ).
main (N) ->
    fib (list_to_integer (atom_to_list (hd (N) ) ) ),
    halt (0).
path (1, Acc) -> Acc;
path (N, Acc) when N rem 2 =:= 0 ->
    path (N - 1, [step | Acc] );
path (N, Acc) ->
    path ( (N - 1) div 2, [jump | Acc] ).
fib (N) -> fib (1, 1, path (N, [] ) ).
fib (N, Nplus1, [Last] ) ->
    case Last of
        step -> Nplus1;
        jump -> N * N + Nplus1 * Nplus1
    end;
fib (N, Nplus1, [jump | T] ) ->
    Pid = self (),
    spawn (fun () -> Pid ! {n1sq, Nplus1 * Nplus1} end),
    spawn (fun () -> Pid ! {mul, 2 * N * Nplus1} end),
    spawn (fun () -> Pid ! {nsq, N * N} end),
    {Nsq, N1sq, Mul} = loop (0, 0, 0),
    fib (Nsq + N1sq, N1sq + Mul, T);
fib (N, Nplus1, [step | T] ) ->
    fib (Nplus1, N + Nplus1, T).
loop (Nsq, N1sq, Mul) ->
    receive
        {nsq, Val} ->
            if
                N1sq > 0 andalso Mul > 0 -> {Val, N1sq, Mul};
                true -> loop (Val, N1sq, Mul)
            end;
        {n1sq, Val} ->
            if
                Mul > 0 andalso Nsq > 0 -> {Nsq, Val, Mul};
                true -> loop (Nsq, Val, Mul)
            end;
        {mul, Val} ->
            if
                N1sq > 0 andalso Nsq > 0 -> {Nsq, N1sq, Val};
                true -> loop (Nsq, N1sq, Val)
            end
    end.
I am running this code on a Phenom X4 and during the minute it takes on my machine to calculate fib(10000000) only one to two cores are working and the others are idling around.

My questions are:
Erlang's default behavior has historically been to run one scheduler, which is basically a native OS thread, which chooses Erlang tasks to run from a queue.  With the advent of multi-core and multi-processor systems, the runtime was extended to take advantage.  Starting the runtime with -smp enabled will cause the runtime to create multiple schedulers, usually one per logical CPU.  You can manually specify the number of schedulers with the -S flag e.g. -S 16.
This is documented in the Erlang Run-Time System Reference Manual.
A deeper discussion of SMP support can be found in this discussion thread.
EDIT
I should also point out that, as of R12B, SMP is enabled by default on platforms that support it (equivalent to the -smp auto flag).  If you're curious about your own runtime, the following quote from the discussion thread will be of interest:
You can see what was chosen at the first line of printout from the "erl" command. E.g. Erlang (BEAM) emulator version 5.6.4 [source] [smp:4] [asynch-threads:0] .....
The "[smp:4]" above tells that the SMP VM is run and with 4 schedulers.
The reason you see so little parallelism is that your program is basically sequential. All the work is being done in one process in the fib/3 function. The processes you spawn all just send a message and then die and the spawning process synchronously waits for these messages so there is no real concurrency. You could just as well just call the loop/3 function directly with these values.
Otherwise it is as others have mentioned that Erlang automatically uses all the multiple cores available and distributes processes across these where possible. In your case however there is little need to do this and no gain so the system does not do it.
This is actually one of the more difficult things in writing concurrent applications. It is not enough to just spread things into many processes, you actually have to make sure that these processes actually run concurrently. It means rethinking your algorithms, which can be difficult.
Erlang does not use threads in the traditional sense. The Erlang VM creates one system thread for each hardware core of the CPU. When you start a thread in Erlang, you are really creating a "task", which is different from a system thread. Erlang manages these tasks inside of the VM.
Depending on VM and it's configuration, these tasks may or may not be mapped to individual CPU cores, which I believe is what you are seeing here.
There is an interesting blog article you might like here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With