Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: using iloc to retrieve data does not match input index

I have a dataset which contains contributor's id and contributor_message. I wanted to retrieve all samples with the same message, say, contributor_message == 'I support this proposal because...'.

I use data.loc[data.contributor_message == 'I support this proposal because...'].index -> so basically you can get the index in the DataFrame with the same message, say those indices are 1, 2, 50, 9350, 30678,...

Then I tried data.iloc[[1,2,50]] and this gives me correct answer, i.e. the indices matches with the DataFrame indices.

However, when I use data.iloc[9350] or higher indices, I will NOT get the corresponding DataFrame index. Say I got 15047 in the DataFrame this time.

Can anyone advise how to fix this problem?

like image 370
RyanKao Avatar asked Oct 15 '25 14:10

RyanKao


1 Answers

This occurs when your indices are not aligned with their integer location.

Note that pd.DataFrame.loc is used to slice by index and pd.DataFrame.iloc is used to slice by integer location.

Below is a minimal example.

df = pd.DataFrame({'A': [1, 2, 1, 1, 5]}, index=[0, 1, 2, 4, 5])

idx = df[df['A'] == 1].index

print(idx)  # Int64Index([0, 2, 4], dtype='int64')

res1 = df.loc[idx]
res2 = df.iloc[idx]

print(res1)
#    A
# 0  1
# 2  1
# 4  1

print(res2)
#    A
# 0  1
# 2  1
# 5  5

You have 2 options to resolve this problem.

Option 1

Use pd.DataFrame.loc to slice by index, as above.

Option 2

Reset index and use pd.DataFrame.iloc:

df = df.reset_index(drop=True)
idx = df[df['A'] == 1].index

res2 = df.iloc[idx]

print(res2)
#    A
# 0  1
# 2  1
# 3  1
like image 142
jpp Avatar answered Oct 18 '25 06:10

jpp



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!