Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Appending to Pandas DataFrame with categorical columns

Tags:

python

pandas

How do I append to a Pandas DataFrame containing predefined columns of categorical datatype:

df=pd.DataFrame([],columns=['a','b'])
df['a']=pd.Categorical([],categories=[0,1])

new_df=pd.DataFrame.from_dict({'a':[1],'b':[0]})
df.append(new_df)

The above throws me an error:

ValueError: all the input arrays must have same number of dimensions

Update: if the categories are strings as opposed to ints, appending seems to work:

df['a']=pd.Categorical([],categories=['Left','Right'])

new_df=pd.DataFrame.from_dict({'a':['Left'],'b':[0]})
df.append(new_df)

So, how do I append to DataFrames with categories of int values? Secondly, I presumed that with binary values (0/1), storing the column as Categorical instead of numeric datatype would be more efficient or faster. Is this true? If not, I may not even bother to convert my columns to Categorical type.

like image 219
wenhoo Avatar asked Dec 30 '25 08:12

wenhoo


1 Answers

You have to keep the both data frames consistent. As you are converting the column a from first data frame as categorical, you need do the same for second data frame. You can do it as following-

import pandas as pd

df=pd.DataFrame([],columns=['a', 'b'])
df['a']=pd.Categorical([],[0, 1])

new_df=pd.DataFrame.from_dict({'a':[0,1,1,1,0,0],'b':[1,1,8,4,0,0]})
new_df['a'] = pd.Categorical(new_df['a'],[0, 1])

df.append(new_df, ignore_index=True)

Hope this helps.

like image 160
Anwar Shaikh Avatar answered Jan 01 '26 20:01

Anwar Shaikh