My dataframe:
df_all_xml_mfiles_tgther
file_names searching_for everything
0 a.txt where Dave Ran Away. Where is Dave?
1 a.txt candy mmmm, candy
2 b.txt time We are looking for the book.
3 b.txt where where the red fern grows
My problem:
I am trying to filter for records that contain the words found in my search criteria. I need to go through 1 record at a time and return the actual record instead of just the word true.
What I have tried:
search_content_array = ['where', 'candy', 'time']
file_names_only = ['a.txt', 'b.txt']
for cc in range(0, len(file_names_only), 1):
for bb in range(0, len(search_content_array), 1):
stuff = `df_all_xml_mfiles_tgther[cc:cc+1].everything.str.contains(search_content_array[bb], flags=re.IGNORECASE, na=False, regex=True)`
if not regex_stuff.empty:
regex_stuff_new = pd.DataFrame([regex_stuff.rename(None)])
regex_stuff_new.columns = ['everything']
regex_stuff_new['searched_for_found'] = search_content_array[bb]
regex_stuff_new['file_names'] = file_names_only[cc]
regex_stuff_new = regex_stuff_new[['file_names', 'searched_for_found', 'everything']] ##This rearranges the columns
df_regex_test = df_regex_test.append(regex_stuff_new, ignore_index=True, sort=False)
The results I am getting are this:
file_names searched_for_found everything
0 a.txt where True
1 a.txt candy True
2 b.txt where True
The results I want are this:
file_names searched_for_found everything
0 a.txt where Dave Ran Away. Where is Dave?
1 a.txt candy mmmm, candy
3 b.txt where where the red fern grows
How do I get the actual value for returned results instead of just true/false?
Do this elementwise using a list comprehension.
df[[y.lower() in x.lower() for x, y in zip(df['everything'], df['searching_for'])]]
Or,
df[[y.lower() in x.lower()
for x, y in df[['everything', 'searching_for']].values.tolist()]]
file_names searching_for everything
0 a.txt where Dave Ran Away. Where is Dave?
1 a.txt candy mmmm, candy
3 b.txt where where the red fern grows
Using replace and str.contains, PS I think cold's method is more succinct
s=df.everything.replace(regex=r'(?i)'+ df.searching_for,value='OkIFINDIT')
df[s.str.contains('OkIFINDIT')]
Out[405]:
file_names searching_for everything
0 a.txt where Dave Ran Away Where is Dave
1 a.txt candy mmmm,candy
3 b.txt where where the red fern grows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With