I am building a neural net with the purpose of make predictions on new data in the future. I first preprocess the training data using sklearn.preprocessing, then train the model, then make some predictions, then close the program. In the future, when new data comes in I have to use the same preprocessing scales to transform the new data before putting it into the model. Currently, I have to load all of the old data, fit the preprocessor, then transform the new data with those preprocessors. Is there a way for me to save the preprocessing objects objects (like sklearn.preprocessing.StandardScaler) so that I can just load the old objects rather than have to remake them?
I think besides pickle, you can also use joblib to do this. As stated in Scikit-learn's manual 3.4. Model persistence
In the specific case of scikit-learn, it may be better to use joblib’s replacement of pickle (dump & load), which is more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators, but can only pickle to the disk and not to a string:
from joblib import dump, load
dump(clf, 'filename.joblib')
Later you can load back the pickled model (possibly in another Python process) with:
clf = load('filename.joblib')
Refer to other posts for more information, Saving StandardScaler() model for use on new datasets, Save MinMaxScaler model in sklearn.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With