I have:
hi
0 1
1 2
2 4
3 8
4 3
5 3
6 2
7 8
8 3
9 5
10 4
I have a list of lists and single integers like this:
[[2,8,3], 2, [2,8]]
For each item in the main list, I want to find out the index of when it appears in the column for the first time.
So for the single integers (i.e 2) I want to know the first time this appears in the hi column (index 1, but I am not interested when it appears again i.e index 6)
For the lists within the list, I want to know the last index of when the list appears in order in that column.
So for [2,8,3] that appears in order at indexes 6, 7 and 8, so I want 8 to be returned. Note that it appears before this too, but is interjected by a 4, so I am not interested in it.
I have so far used:
for c in chunks:
# different method if single note chunk vs. multi
if type(c) is int:
# give first occurence of correct single notes
single_notes = df1[df1['user_entry_note'] == c]
single_notes_list.append(single_notes)
# for multi chunks
else:
multi_chunk = df1['user_entry_note'].isin(c)
multi_chunk_list.append(multi_chunk)
You can do it with np.logical_and.reduce + shift. But there are a lot of edge cases to deal with:
import numpy as np
def find_idx(seq, df, col):
if type(seq) != list: # if not list
s = df[col].eq(seq)
if s.sum() >= 1: # if something matched
idx = s.idxmax().item()
else:
idx = np.NaN
elif seq: # if a list that isn't empty
seq = seq[::-1] # to get last index
m = np.logical_and.reduce([df[col].shift(i).eq(seq[i]) for i in range(len(seq))])
s = df.loc[m]
if not s.empty: # if something matched
idx = s.index[0]
else:
idx = np.NaN
else: # empty list
idx = np.NaN
return idx
l = [[2,8,3], 2, [2,8]]
[find_idx(seq, df, col='hi') for seq in l]
#[8, 1, 7]
l = [[2,8,3], 2, [2,8], [], ['foo'], 'foo', [1,2,4,8,3,3]]
[find_idx(seq, df, col='hi') for seq in l]
#[8, 1, 7, nan, nan, nan, 5]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With