I have a dataset which contains contributor's id and contributor_message. I wanted to retrieve all samples with the same message, say, contributor_message == 'I support this proposal because...'.
I use data.loc[data.contributor_message == 'I support this proposal because...'].index -> so basically you can get the index in the DataFrame with the same message, say those indices are 1, 2, 50, 9350, 30678,...
Then I tried data.iloc[[1,2,50]] and this gives me correct answer, i.e. the indices matches with the DataFrame indices.
However, when I use data.iloc[9350] or higher indices, I will NOT get the corresponding DataFrame index. Say I got 15047 in the DataFrame this time.
Can anyone advise how to fix this problem?
This occurs when your indices are not aligned with their integer location.
Note that pd.DataFrame.loc
is used to slice by index and pd.DataFrame.iloc
is used to slice by integer location.
Below is a minimal example.
df = pd.DataFrame({'A': [1, 2, 1, 1, 5]}, index=[0, 1, 2, 4, 5])
idx = df[df['A'] == 1].index
print(idx) # Int64Index([0, 2, 4], dtype='int64')
res1 = df.loc[idx]
res2 = df.iloc[idx]
print(res1)
# A
# 0 1
# 2 1
# 4 1
print(res2)
# A
# 0 1
# 2 1
# 5 5
You have 2 options to resolve this problem.
Option 1
Use pd.DataFrame.loc
to slice by index, as above.
Option 2
Reset index and use pd.DataFrame.iloc
:
df = df.reset_index(drop=True)
idx = df[df['A'] == 1].index
res2 = df.iloc[idx]
print(res2)
# A
# 0 1
# 2 1
# 3 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With