My code looks something like this:
from joblib import Parallel, delayed
# prediction model - 10s of megabytes on disk
LARGE_MODEL = load_model('path/to/model')
file_paths = glob('path/to/files/*')
def do_thing(file_path):
  pred = LARGE_MODEL.predict(load_image(file_path))
  return pred
Parallel(n_jobs=2)(delayed(do_thing)(fp) for fp in file_paths)
My question is whether LARGE_MODEL will be pickled/unpickled with each iteration of the loop. And if so, how can I make sure each worker caches it instead (if that's possible)?
TL;DR - it preserves order for both backends.
joblib is basically a wrapper library that uses other libraries for running code in parallel. It also lets us choose between multi-threading and multi-processing. joblib is ideal for a situation where you have loops and each iteration through loop calls some function that can take time to complete.
The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax. Under Windows, the use of multiprocessing. Pool requires to protect the main loop of code to avoid recursive spawning of subprocesses when using joblib.
Joblib is a set of tools to provide lightweight pipelining in Python. In particular: transparent disk-caching of functions and lazy re-evaluation (memoize pattern) easy simple parallel computing.
TLDR
The parent process pickles large model once. That can be made more performant by ensuring large model is a numpy array backed to a memfile. Workers can
load_temporary_memmapmuch faster than from disk.
Your job is parallelized and likely to be using joblibs._parallel_backends.LokyBackend.
In joblib.parallel.Parallel.__call__, joblib tries to initialize the backend to use LokyBackend when n_jobs is set to a count greater than 1.
LokyBackend uses a shared temporary folder for the same Parallel object. This is relevant for reducers that modify the default pickling behavior.
Now, LokyBackend configures a MemmappingExecutor that shares this folder to the reducers.
If you have numpy installed and your model is a clean numpy array, you are guaranteed to have it pickled once as a memmapped file using the ArrayMemmapForwardReducer and passed from parent to child processes.
Otherwise it is pickled using the default pickling as a bytes object.
You can know how your model is pickled in the parent process reading the debug logs from joblib.
Each worker 'unpickles' large model so there is really no point in caching the large model there.
You can only improve the source from where the pickled large model is loaded from in the workers by backing your models as a memory mapped file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With