I use Pandas GroupBy and Groupby.agg by using functions such as sum, max and min for my numerical columns but I noticed that the data types I previously imposed to my columns (such as np.int8, np.int16, np.int32) are not preserved after the GroupBy aggregation and that actually every column is overcasted to int64.
Pandas version 1.1.5
My current solution is to just re-downcast once finished the groupby aggregation, is this a known issue and/or is there a nicer solution?
I don't get the same result. Types are conserved.
import pandas as pd
import numpy as np
df = pd.DataFrame(dict(a=[1,2,3,4,5], b=[1,2,3,4,5], c=[1,2,3,4,5]))
df = df.astype({'a': np.int8, 'b': np.int16, 'c': np.int32})
new_df = df.groupby(by='c').max()
print(new_df.dtypes)
""" Output - dtypes are conserved.
a int8
b int16
dtype: object
"""
Maybe you used an aggregator that went trough multiple columns. If you were to aggregate a + b => You'd get int16
new_df = df.groupby(by='c').apply(lambda x: x['a'] + x['b'])
print(new_df.dtypes)
# Output : int16
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With