Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SciKit Learn Parallel Processing 0.17 to 0.18 (Python 2.7)

For some reason the code below is useing all available cores even though I have set n_jobs equal to 1. Have I missed something or should I submit an issue at scikit ?

import numpy as np
from sklearn import linear_model

liReg = linear_model.LinearRegression(n_jobs=1)

a = np.random.rand(10000,20)
b = np.random.rand(10000)

for i in range(1000):
    liReg.fit(a, b)
    liReg.predict(a)

I have two identical servers but one runs scikit v0.18 and one v0.17 - this only happens when using 0.18.

Here is the output of time python example.py:

Using 0.17 - just uses one core:

real    0m8.381s
user    0m6.387s
sys     0m1.677s

Using 0.18 - uses all cores:

real    0m32.308s # I guess longer due to overhead of parallel process management
user    2m53.612s
sys     20m48.285s
like image 613
Alexander Morley Avatar asked Jan 19 '26 13:01

Alexander Morley


1 Answers

From @GaelVaroquaux on Github: https://github.com/scikit-learn/scikit-learn/issues/8883#issuecomment-301567818

Most likely you are using a parallel-enabled linear algebra library (like MKL or openBLAS). Hence, it is not scikit-learn that is doing parallel computing, and it cannot control it (it is a component that is used inside scikit-learn). You need to find out how to control the corresponding computing brick.

In my case I was using OpenBLAS on fedora linux so I simply added: export OPENBLAS_NUM_THREADS=1 to my .bashrc to disable multithreading within the linear algebra call.

like image 145
Alexander Morley Avatar answered Jan 21 '26 03:01

Alexander Morley