SciKit Learn Parallel Processing 0.17 to 0.18 (Python 2.7)

Question

For some reason the code below is useing all available cores even though I have set n_jobs equal to 1. Have I missed something or should I submit an issue at scikit ?

import numpy as np
from sklearn import linear_model

liReg = linear_model.LinearRegression(n_jobs=1)

a = np.random.rand(10000,20)
b = np.random.rand(10000)

for i in range(1000):
    liReg.fit(a, b)
    liReg.predict(a)

I have two identical servers but one runs scikit v0.18 and one v0.17 - this only happens when using 0.18.

Here is the output of time python example.py:

Using 0.17 - just uses one core:

real    0m8.381s
user    0m6.387s
sys     0m1.677s

Using 0.18 - uses all cores:

real    0m32.308s # I guess longer due to overhead of parallel process management
user    2m53.612s
sys     20m48.285s

Alexander Morley · Accepted Answer

From @GaelVaroquaux on Github: https://github.com/scikit-learn/scikit-learn/issues/8883#issuecomment-301567818

Most likely you are using a parallel-enabled linear algebra library (like MKL or openBLAS). Hence, it is not scikit-learn that is doing parallel computing, and it cannot control it (it is a component that is used inside scikit-learn). You need to find out how to control the corresponding computing brick.

In my case I was using OpenBLAS on fedora linux so I simply added: export OPENBLAS_NUM_THREADS=1 to my .bashrc to disable multithreading within the linear algebra call.

SciKit Learn Parallel Processing 0.17 to 0.18 (Python 2.7)

Tags:

python

multiprocessing

scikit-learn

Alexander Morley

1 Answers

Alexander Morley

Recent Activity

Donate For Us

SciKit Learn Parallel Processing 0.17 to 0.18 (Python 2.7)

Tags:

python

multiprocessing

scikit-learn

Alexander Morley

1 Answers

Alexander Morley

Related questions

Recent Activity

Donate For Us