I recently had to take a dataframe and prepare it to output to an Excel file. However, I didn't want to save it to the local system, but rather pass the prepared data to a separate function that saves to the cloud based on a URI. After searching through a number of ExcelWriter examples, I couldn't find what I was looking for.
The goal is to take the dataframe, e.g.:
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6})
And temporarily store it as bytes in a variable, e.g.:
processed_data = <bytes representing the excel output>
The solution I came up with is provided in the answers and hopefully will help someone else. Would love to see others' solutions as well!
In my case, I created an io module that allows you to use URIs to specify different cloud destinations. For example, "paths" starting with gs:// get sent to Google Storage (using gsutils-like syntax). I process the data as my first step, and then pass that processed data to a "save" function, which itself filters to determine the right path.
df.to_csv() actually works with no path and automatically returns a string (at least in recent versions), so this is my solution to allow to_excel() to do the same.
Works like the common examples, but instead of specifying the file in ExcelWriter, it uses the standard library's BytesIO to store in a variable (processed_data):
from io import BytesIO
import pandas as pd
df = pd.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6]
})
output = BytesIO()
writer = pd.ExcelWriter(output)
df.to_excel(writer) # plus any **kwargs
writer.save()
processed_data = output.getvalue()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With