Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't I write to a BytesIO buffer directly in polars?

In pandas I can do something like:

buffer = BytesIO()
df.write_parquet(buffer)

# later, I can read the bytes
buffer.read()

But when I do this in polars the buffer is empty. Why does this happen?

For context, the reason I'd like to do this is because I enjoy working with the cloudpathlib library, and this way i'd be able to write bytes directly into, for example, s3. I am aware that there are other ways to achieve this, such as by using s3fs, but I'm wondering why writing to the buffer directly does not work.

like image 448
alex23ro Avatar asked Nov 04 '25 17:11

alex23ro


1 Answers

The buffer isn't empty, you're just at the end of it. It's like if you write a few paragraphs on a piece of paper in a notebook then turn the page, when you look down at the notebook it's empty. It's not that the notebook didn't "store" what you wrote, it's just that it's behind where you're looking.

For instance, this works:

import polars as pl
from io import BytesIO
buffer=BytesIO()
df=pl.DataFrame({'a':[1,2,3]})
df.write_parquet(buffer)
buffer.seek(0)  # This is what you need to do to start from the beginning and read.
print(pl.read_parquet(buffer))
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘
like image 113
Dean MacGregor Avatar answered Nov 06 '25 07:11

Dean MacGregor



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!