Pandas GroupBy aggregation does not preserve dtypes

Question

I use Pandas GroupBy and Groupby.agg by using functions such as sum, max and min for my numerical columns but I noticed that the data types I previously imposed to my columns (such as np.int8, np.int16, np.int32) are not preserved after the GroupBy aggregation and that actually every column is overcasted to int64. Pandas version 1.1.5

My current solution is to just re-downcast once finished the groupby aggregation, is this a known issue and/or is there a nicer solution?

Florian Fasmeyer · Accepted Answer

Tested on pandas version 1.1.5

I don't get the same result. Types are conserved.

import pandas as pd
import numpy as np

df = pd.DataFrame(dict(a=[1,2,3,4,5], b=[1,2,3,4,5], c=[1,2,3,4,5]))
df = df.astype({'a': np.int8, 'b': np.int16, 'c': np.int32})
new_df = df.groupby(by='c').max()
print(new_df.dtypes)

""" Output - dtypes are conserved.
a     int8
b    int16
dtype: object
"""

Maybe you used an aggregator that went trough multiple columns. If you were to aggregate a + b => You'd get int16

new_df = df.groupby(by='c').apply(lambda x: x['a'] + x['b'])
print(new_df.dtypes)
# Output : int16

Pandas GroupBy aggregation does not preserve dtypes

Tags:

pandas

Guido

1 Answers

Tested on pandas version 1.1.5

Florian Fasmeyer

Recent Activity

Donate For Us

Pandas GroupBy aggregation does not preserve dtypes

Tags:

pandas

Guido

1 Answers

Tested on pandas version 1.1.5

Florian Fasmeyer

Related questions

Recent Activity

Donate For Us