pd.concat returning more rows that in the base dataframes

Question

I want to create a dataframe using columns from two different dataframes. I was using pd.concat but that was returning more than actual number of rows.

Although if I create dataframe by stacking the columns first in numpy array then I am getting expected result.

print(df1.shape)
print(df2.shape)
result1 = pd.concat([df1, df2], axis=1)
result2 = pd.Dataframe(np.column_stack([df1.user_id, df2.prob]), 
columns = ["user_id", "prob"])
print(result1.shape)
print(result2.shape)

Output:

(221471, 1)
(221471, 1)
(221515, 2)
(221471, 2)

Can anyone please help me understand why concat is returning more number of rows?

Nikos H. · Accepted Answer

Actually the linked answer that the comments point to, is not complete. You need to use, exactly before the concat operation:

df1.reset_index(drop=True, inplace=True) df2.reset_index(drop=True, inplace=True)

as seen in pandas concat ignore_index doesn't work. I would comment the answer but I haven't got enough rep.

pd.concat returning more rows that in the base dataframes

Tags:

python-3.x

pandas

Kunal Kishore Singh

1 Answers

Nikos H.

Recent Activity

Donate For Us

pd.concat returning more rows that in the base dataframes

Tags:

python-3.x

pandas

Kunal Kishore Singh

1 Answers

Nikos H.

Related questions

Recent Activity

Donate For Us