Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas GroupBy aggregation does not preserve dtypes

Tags:

pandas

I use Pandas GroupBy and Groupby.agg by using functions such as sum, max and min for my numerical columns but I noticed that the data types I previously imposed to my columns (such as np.int8, np.int16, np.int32) are not preserved after the GroupBy aggregation and that actually every column is overcasted to int64. Pandas version 1.1.5

My current solution is to just re-downcast once finished the groupby aggregation, is this a known issue and/or is there a nicer solution?

like image 478
Guido Avatar asked Oct 25 '25 19:10

Guido


1 Answers

Tested on pandas version 1.1.5

I don't get the same result. Types are conserved.

import pandas as pd
import numpy as np

df = pd.DataFrame(dict(a=[1,2,3,4,5], b=[1,2,3,4,5], c=[1,2,3,4,5]))
df = df.astype({'a': np.int8, 'b': np.int16, 'c': np.int32})
new_df = df.groupby(by='c').max()
print(new_df.dtypes)

""" Output - dtypes are conserved.
a     int8
b    int16
dtype: object
"""

Maybe you used an aggregator that went trough multiple columns. If you were to aggregate a + b => You'd get int16

new_df = df.groupby(by='c').apply(lambda x: x['a'] + x['b'])
print(new_df.dtypes)
# Output : int16
like image 185
Florian Fasmeyer Avatar answered Oct 28 '25 03:10

Florian Fasmeyer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!