In pandas I can do something like:
buffer = BytesIO()
df.write_parquet(buffer)
# later, I can read the bytes
buffer.read()
But when I do this in polars the buffer is empty. Why does this happen?
For context, the reason I'd like to do this is because I enjoy working with the cloudpathlib library, and this way i'd be able to write bytes directly into, for example, s3. I am aware that there are other ways to achieve this, such as by using s3fs, but I'm wondering why writing to the buffer directly does not work.
The buffer isn't empty, you're just at the end of it. It's like if you write a few paragraphs on a piece of paper in a notebook then turn the page, when you look down at the notebook it's empty. It's not that the notebook didn't "store" what you wrote, it's just that it's behind where you're looking.
For instance, this works:
import polars as pl
from io import BytesIO
buffer=BytesIO()
df=pl.DataFrame({'a':[1,2,3]})
df.write_parquet(buffer)
buffer.seek(0) # This is what you need to do to start from the beginning and read.
print(pl.read_parquet(buffer))
shape: (3, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
└─────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With