Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct way to do multithreaded computations in SBCL

Context

I need to do computations using multi-threading. I use SBCL and portability is not a concern. I am aware that bordeaux-threads and lparallel exist but I want to implement something at the relatively low level provided by the specific SBCL threading implementation. I need maximal speed, even at the expense of readability/programming effort.

Example of computation intensive operation

We can define a sufficiently computation-intensive function that will benefit from multi-threading.

(defun intensive-sqrt (x)
  "Dummy calculation for intensive algorithm.
Approx 50 ms for 1e6 iterations."
  (let ((y x))
    (dotimes (it 1000000 t)
      (if (> y 1.01d0)
          (setf y (sqrt y))
          (setf y (* y y y))))
    y))

Mapping each computation to a thread and execute

Given a list of argument-lists llarg and a function fun, we want to compute nthreads results and return the list of results res-list. Here is what I came up with using the resources I found (see below).

(defmacro splice-arglist-help (fun arglist)
  "Helper macro.
   Splices a list 'arglist' (arg1 arg2 ...) into the function call of 'fun'
   Returns (funcall fun arg1 arg2 ...)"
  `(funcall ,fun ,@arglist))

(defun splice-arglist (fun arglist)
  (eval `(splice-arglist-help ,fun ,arglist)))

(defun maplist-fun-multi (fun llarg nthreads)
  "Maps 'fun' over list of argument lists 'llarg' using multithreading.
   Breaks up llarg and feeds it to each thread.
   Appends all the result lists at the end."
  (let ((thread-list nil)
        (res-list nil))
    ;; Create and run threads
    (dotimes (it nthreads t)
      (let ((larg-temp (elt llarg it)))
        (setf thread-list (append thread-list
                                  (list (sb-thread:make-thread
                                         (lambda ()
                                           (splice-arglist fun larg-temp))))))))
    ;; Join threads
    ;; Threads are joined in order, not optimal for speed.
    ;; Should be joined when finished ?
    (dotimes (it (list-length thread-list) t)
      (setf res-list (append res-list (list (sb-thread:join-thread (elt thread-list it))))))
    res-list))

nthreads does not necessarily match the length of llarg, but I avoid the extra book-keeping just for the example simplicity's sake. I also omitted the various declare used for optimization.

We can test the multi-threading and compare timings using :

(defparameter *test-args-sqrt-long* nil)
(dotimes (it 10000 t)
  (push (list (+ 3d0 it)) *test-args-sqrt-long*))

(time (intensive-sqrt 5d0))
(time (maplist-fun-multi #'intensive-sqrt *test-args-sqrt-long* 100))

The number of threads is quite high. I think the optimum would be to use as many threads as the CPU has, but I noticed the performance drop-off is barely noticeable in terms of time/operations. Doing more operations would involve breaking up the input lists into smaller pieces.

The above code outputs, on a 2 cores/4 threads machine :

Evaluation took:
  0.029 seconds of real time
  0.015625 seconds of total run time (0.015625 user, 0.000000 system)
  55.17% CPU
  71,972,879 processor cycles
  22,151,168 bytes consed

Evaluation took:
  1.415 seconds of real time
  4.703125 seconds of total run time (4.437500 user, 0.265625 system)
  [ Run times consist of 0.205 seconds GC time, and 4.499 seconds non-GC time. ]
  332.37% CPU
  3,530,632,834 processor cycles
  2,215,345,584 bytes consed

What's bugging me

The example I've given works very well and is robust (ie results don't get mixed up between threads, and I experience no crash). The speed gain is also there and the computations do use several cores/threads on the machines I've tested this code on. But there are a few things that I'd like an opinion/help on :

  1. The use of the argument list llarg and larg-temp. Is this really necessary ? Is there any way to avoid manipulating potentially huge lists ?
  2. Threads are joined in the order in which they are stored in the thread-list. I imagine this would not be optimal if operations each took a different time to complete. Is there a way to join each thread when it is finished, instead of waiting ?

The answers should be in the resources I already found, but I find the more advanced stuff hard to grapple with.

Resources found so far

  • http://www.sbcl.org/manual/#Threading
  • http://cl-cookbook.sourceforge.net/process.html
  • https://lispcookbook.github.io/cl-cookbook/process.html
like image 409
Frank Jefferson Avatar asked Oct 14 '25 14:10

Frank Jefferson


1 Answers

Stylistic issues

  • The splice-arglist helpers are not needed at all (so I'll also skip details in them). Use apply in your thread function instead:

    (lambda ()
      (apply fun larg-temp))
    
  • You don't need to (and should not) index into a list, because that is O(n) for each lookup—your loops are quadratic. Use dolist for simple side-effective loops, or loop when you have e. g. parallel iteration:

    (loop :repeat nthreads
          :for args :in llarg
          :collect (sb-thread:make-thread (lambda () (apply fun args))))
    
  • For going over a list while creating a new list of the same length where each element is calculated from the corresponding element in the source list, use mapcar:

    (mapcar #'sb-thread:join-thread threads)
    

Your function thus becomes:

(defun map-args-parallel (fun arglists nthreads)
  (let ((threads (loop :repeat nthreads
                       :for args :in arglists
                       :collect (sb-thread:make-thread
                                 (lambda ()
                                   (apply fun args))))))
    (mapcar #'sb-thread:join-thread threads)))

Performance

You are right that one usually creates only as many threads as ca. the number of cores available. If you test performance by always creating n threads, then joining them, then going to the next batch, you will indeed have not much difference in performance. That is because the inefficiency lies in creating the threads. A thread is about as resource intensive as a process.

What one usually does is to create a thread pool where the threads do not get joined, but instead reused. For that, you need some other mechanism to communicate arguments and results, e. g. channels (e. g. from chanl).

Note however that e. g. lparallel already provides a pmap function, and it does things right. The purpose of such wrapper libraries is not only to give the user (programmer) a nice interface, but also to think really hard about the problems and optimize sensibly. I am quite confident that pmap will be significantly faster than your attempt.

like image 86
Svante Avatar answered Oct 17 '25 10:10

Svante



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!