I'm working through someone else's poorly documented code (it uses tf-idf to find clusters of documents), and I came across this:
from sklearn.externals import joblib
#joblib.dump(km, 'doc_cluster.pkl')
km = joblib.load('doc_cluster.pkl')
clusters = km.labels_.tolist()
It's supposed to unpickle doc_cluster.pkl, but when I run it, I get a DepreciationWarning that says that the file was generated with a joblib version less than 0.10, and it requests that I regenerate the file. However, I can't do that, because I didn't create doc_cluster.pkl. So is it ok to just move forward and ignore the warning, or will that mess things up down the line?
A deprecation warning is just a warning, and loading succeeds. The pickle file is still being loaded and supported, at least in this version of sklearn (which bundles the 3rd party joblib project). A future version of joblib may stop supporting that specific format, but that hasn't happened yet.
You can re-create the pickle file with the current version, simply by dumping the same object back to disk:
km = joblib.load('doc_cluster.pkl')
joblib.dump(km, 'doc_cluster.pkl', compress=True)
Also see the joblib persistence documentation.
Alternatively, you could suppress the warning, by using a warning filter. You can set filters in the PYTHONWARNINGS environment variable, with the -W command-lne switch (I'd use the string ignore::DeprecationWarning:sklearn.externals.joblib), or by using the warnings module directly:
import warnings
warnings.filterwarnings(
"ignore", category=DeprecationWarning,
module=r'sklearn\.externals\.joblib'
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With