Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write JSON column to file in polars-rs

I have a dataframe with a string column that is valid JSON. I'd like to transform the contents of this column into a file that is a JSON array where each row becomes an object. I'm looking for a good way to implement this.

In python, I'd use polars to do this:

import polars as pl
import json

df = df.with_columns(
    pl.col("my_str_column").str.json_decode()
)

column_as_list = df.select(pl.col("my_str_column")).to_series().to_list()

with open('target_file.json', 'w+') as w:
    w.write(json.dumps([ob for ob in column_as_list]))

This approach might not be optimal, but it works fine. Now, I'd like to translate this into polars-rs. Here's where I got:

df = df.lazy().with_column(
        col("my_str_column").str().json_decode(None, Some(100))
    ).collect()?;

This turns df.select("my_str_column") into a column with struct datatype.

I then tried to convert it to a series and write to file (inspired in these SO posts: How to write a vector to a json file, How to get a Vec from polars Series or ChunkedArray).

let binding: Series = df.select("my_str_column").pop().unwrap();

// let test: Vec<_> = binding.struct_().iter().collect(); // this  would panic 

So I removed the json_decode() part, leaving the column type as str, and used a snipped I found here (How do I change the string element of vector to json in Rust) to implement this:

// hacky way to convert the desired column into a series
let binding = df.select("my_str_column").pop().unwrap(); 
let series_collect: Vec<&str> = binding.str().unwrap().into_no_null_iter().collect();

// Convert input into Vec<Value>, by parsing each string
let elems = series_collect.into_iter()
    .map(serde_json::from_str)
    .collect::<Result<Vec<Value>,_>>()
    .unwrap();

let json_val = Value::Array(elems);

// now write this json_val into disk
let mut path = std::env::current_dir()?;
path.push("target_file.json");

let file = File::create(path)?;
let mut writer = BufWriter::new(file);
serde_json::to_writer(&mut writer, &json_val).unwrap();
writer.flush()?;

This works, but I'm not sure if it's a good way to write a column to a JSON file. Any help to improve this code would be appreciated.

like image 755
willsbit Avatar asked Nov 18 '25 15:11

willsbit


1 Answers

As for your py-polars approach, you could .unnest() and use .write_json() instead.

e.g. with row_oriented=True

df = pl.DataFrame({
   "my_str_column": [
      """{"a": 1}""",
      """{"a": 2}""",
      """{"a": 3}"""
   ]
})

(df.select(pl.col("my_str_column").str.json_decode())  
   .unnest("my_str_column")
   .write_json(row_oriented=True)
)   
'[{"a":1},{"a":2},{"a":3}]'

We can see how write_json() is implemented here:

  • https://github.com/pola-rs/polars/blob/dd883420f0a44af061f36e088f807ad99a9f1a3d/py-polars/src/dataframe.rs#L505-L507
JsonWriter::new(file)
   .with_json_format(JsonFormat::Json)
   .finish(&mut self.df)

Meaning you should be able to:

let out = df.lazy().with_column(...).unnest(&["my_str_column"]).collect()?;

JsonWriter::new(writer)
   .with_json_format(JsonFormat::Json)
   .finish(&mut out)
like image 160
jqurious Avatar answered Nov 21 '25 09:11

jqurious



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!