How can I force a pandas DataFrame to retain None values, even when using astype()?
Since the pd.DataFrame constructor offers no compound dtype parameter, I fix the types (required for to_parquet()) with the following function:
def _typed_dataframe(data: list) -> pd.DataFrame:
typing = {
'name': str,
'value': np.float64,
'info': str,
'scale': np.int8,
}
result = pd.DataFrame(data)
for label in result.keys():
result[label] = result[label].astype(typing[label])
return result
Unfortunately, result[info] = result[info].astype(str) transforms all None values in info to "None" strings. How can I forbid this, i.e. retain None values?
To be more precise: None values in data become np.nan in the result DataFrame, which become "nan" by astype(str), which become "None" when extracted from result.
Following @frosty's comment, we can use the alternative
typing = {
'name': str,
'value': np.float64,
'info': pd.StringDtype(),
'scale': np.int8,
}
However, this requires pandas ~= 1.0.0.
As better solution, you can replace
for label in result.keys():
result[label] = result[label].astype(typing[label])
by
result.astype(schema)
Unfortunately, result.astype(typing) has no effect since it cannot handle compound types.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With