Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pd.concat returning more rows that in the base dataframes

I want to create a dataframe using columns from two different dataframes. I was using pd.concat but that was returning more than actual number of rows.

Although if I create dataframe by stacking the columns first in numpy array then I am getting expected result.

print(df1.shape)
print(df2.shape)
result1 = pd.concat([df1, df2], axis=1)
result2 = pd.Dataframe(np.column_stack([df1.user_id, df2.prob]), 
columns = ["user_id", "prob"])
print(result1.shape)
print(result2.shape)

Output:

(221471, 1)
(221471, 1)
(221515, 2)
(221471, 2)

Can anyone please help me understand why concat is returning more number of rows?

like image 331
Kunal Kishore Singh Avatar asked Oct 20 '25 09:10

Kunal Kishore Singh


1 Answers

Actually the linked answer that the comments point to, is not complete. You need to use, exactly before the concat operation:

df1.reset_index(drop=True, inplace=True) df2.reset_index(drop=True, inplace=True)

as seen in pandas concat ignore_index doesn't work. I would comment the answer but I haven't got enough rep.

like image 116
Nikos H. Avatar answered Oct 22 '25 05:10

Nikos H.