I want to create a dataframe using columns from two different dataframes. I was using pd.concat but that was returning more than actual number of rows.
Although if I create dataframe by stacking the columns first in numpy array then I am getting expected result.
print(df1.shape)
print(df2.shape)
result1 = pd.concat([df1, df2], axis=1)
result2 = pd.Dataframe(np.column_stack([df1.user_id, df2.prob]),
columns = ["user_id", "prob"])
print(result1.shape)
print(result2.shape)
Output:
(221471, 1)
(221471, 1)
(221515, 2)
(221471, 2)
Can anyone please help me understand why concat is returning more number of rows?
Actually the linked answer that the comments point to, is not complete. You need to use, exactly before the concat operation:
df1.reset_index(drop=True, inplace=True)
df2.reset_index(drop=True, inplace=True)
as seen in pandas concat ignore_index doesn't work. I would comment the answer but I haven't got enough rep.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With