I’m developing a Python script that collects “snapshots” of my data at different points in time and saves them into JSON files for later analysis. I want to store each snapshot as a single line in a JSONL file.
I run this script using Python 3.10/3.11 inside Streamlit on Git Bash, and some snapshots are not written correctly because numpy data types like int64 are not recognized by json.dumps. This causes the JSON file to become invalid, and prevents me from loading the data later.
Here’s a simplified snippet of my code:
import json
import numpy as np
import logging
logger = logging.getLogger(__name__)
def is_json_serializable(obj):
try:
json.dumps(obj)
return True
except:
return False
try:
with open('het_q2_snapshots.json', 'w', encoding='utf-8') as f:
for i, snapshot in enumerate(snapshots):
if is_json_serializable(snapshot):
json_line = json.dumps(snapshot, ensure_ascii=False, separators=(',', ':'))
f.write(json_line + '\n')
else:
logger.error(f"Snapshot {i} is not serializable: {snapshot}")
# conversion attempt
safe_snapshot = {k: int(v) if isinstance(v, np.int64) else v
for k, v in snapshot.items()}
json_line = json.dumps(safe_snapshot, ensure_ascii=False, separators=(',', ':'))
f.write(json_line + '\n')
except Exception as e:
logger.error(f"Error saving snapshots: {e}")
When I run this, I get the following errors:
ERROR - Snapshot loading error: Expecting value: line 4 column 14 (char 57)
ERROR - Unexpected error: Object of type int64 is not JSON serializable
I already tried converting all numpy.int64 values to Python int, using default=str in json.dumps, and checking for non-serializable fields, but the problem persists.
Question: What’s the best way to ensure that all numpy.int64 (or any non-native types) are properly converted before serializing, especially when the data can be nested in dictionaries/lists?
Thanks a lot!
I guess something like this solves all of your problems:
class NumpyEncoder(json.JSONEncoder):
"""Custom encoder for numpy data types"""
def default(self, obj):
if isinstance(
obj,
(
np.int_,
np.intc,
np.intp,
np.int8,
np.int16,
np.int32,
np.int64,
np.uint8,
np.uint16,
np.uint32,
np.uint64,
),
):
return int(obj)
elif isinstance(obj, (np.float16, np.float32, np.float64)):
return float(obj)
elif isinstance(obj, np.complex64, np.complex128):
return {"real": obj.real, "imag": obj.imag}
elif isinstance(obj, (np.ndarray,)):
return obj.tolist()
elif isinstance(obj, (np.bool_)):
return bool(obj)
elif isinstance(obj, (np.void)):
return None
return json.JSONEncoder.default(self, obj)
That can be used very easily:
json.dumps(variable, cls=NumpyEncoder)
Credits to hmallen:
https://github.com/hmallen/numpyencoder/blob/master/numpyencoder/numpyencoder.py
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With