When training ML models with sklearn, I typically make use of the StandardScaler built into sklearn... first fitting the scaler to the training data, then transforming the training data... and finally using the same StandardScaler object to also transform the testing data via it’s previous fit parameters from the training dataset.
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
However, I’ve recently realized that I think any attempt to pickle and re-deploy the ML model in a different environment would also REQUIRE pickling the StandardScaler as well... otherwise new entry data would have no way to be transformed before being fed into the model. Is this a mistake on my part, or is there something I’m simply missing. Will I have to pickle BOTH the ML model and the StandardScaler every time I attempt to deploy them elsewhere? It just seems odd that this was never mentioned in the Sci-kit learn Model Persistence documentation.
joblib.dump(model, 'pickledModel.joblib')
joblib.dump(sc, 'pickledScaler.joblib')
Actually, for deployment you might also want to serialize your model to put it into database as bytes. With joblib it's a bit tricky, as you can only dump to file. Basically, you create some dummy container and dump it there.
from io import BytesIO
import joblib
def serialize(obj) -> bytes:
    container = BytesIO()
    joblib.dump(obj, container)
    container.seek(0)
    serialized = container.read()
    return serialized
def deserialize(obj: bytes):
    container = BytesIO()
    container.write(obj)
    container.seek(0)
    deserialized = joblib.load(container)
    return deserialized
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With