I'm trying to load a large file (2GB in size) filled with JSON strings, delimited by newlines. Ex:
{ "key11": value11, "key12": value12, } { "key21": value21, "key22": value22, } … The way I'm importing it now is:
content = open(file_path, "r").read() j_content = json.loads("[" + content.replace("}\n{", "},\n{") + "]") Which seems like a hack (adding commas between each JSON string and also a beginning and ending square bracket to make it a proper list).
Is there a better way to specify the JSON delimiter (newline \n instead of comma ,)?
Also, Python can't seem to properly allocate memory for an object built from 2GB of data, is there a way to construct each JSON object as I'm reading the file line by line? Thanks!
Python read JSON file line by lineStep 1: import json module. Step 3: Read the json file using open() and store the information in file variable. Step 4: Convert item from json to python using load() & store the information in db variable. Step 5: append db in lineByLine empty list.
To load big JSON files in a memory efficient and fast way with Python, we can use the ijson library. We call ijson. parse to parse the file opened by open . Then we print the key prefix , data type of the JSON value store in the_type , and the value of the entry with the given key prefix .
Line-delimited JSON can be read by a parser that can handle concatenated JSON. Concatenated JSON that contains newlines within a JSON object can't be read by a line-delimited JSON parser. The terms "line-delimited JSON" and "newline-delimited JSON" are often used without clarifying if embedded newlines are supported.
Just read each line and construct a json object at this time:
with open(file_path) as f: for line in f: j_content = json.loads(line) This way, you load proper complete json object (provided there is no \n in a json value somewhere or in the middle of your json object) and you avoid memory issue as each object is created when needed.
There is also this answer.:
https://stackoverflow.com/a/7795029/671543
contents = open(file_path, "r").read() data = [json.loads(str(item)) for item in contents.strip().split('\n')]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With