Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use BeautifulSoup to search for elements that occur before another element?

I'm using BeautifulSoup 4 with Python 3.7. I have the following HTML ...

<tr>
    <td class="info"><div class="title">...</div></td>
</tr>
<tr class="ls">
    <td colspan="3">Less similar results</td>
</tr>
<tr>
    <td class="info"><div class="title">...</div></td>
</tr>

I would like to extract the DIVs with class="title", however, I only want to find the ones that occur before the element in the table whose TD text = "Less similar results". Right now I have this

elts = soup.find("td", class_="info").find_all("div", class_="title")

But this returns all DIVs with that class, even ones that have occurred after the element I want to screen for. How do I refine my search to only include results before that particualr TD?

like image 982
Dave Avatar asked Nov 28 '25 19:11

Dave


1 Answers

You can use CSS selector tr:not(tr:has(td:contains("Less similar results")) ~ *) div.title:

data = '''<tr>
    <td class="info"><div class="title">THIS YOU WANT ...</div></td>
</tr>
<tr class="ls">
    <td colspan="3">Less similar results</td>
</tr>
<tr>
    <td class="info"><div class="title">THIS YOU DON'T WANT ...</div></td>
</tr>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')

print(soup.select('tr:not(tr:has(td:contains("Less similar results")) ~ *) div.title'))

Prints:

[<div class="title">THIS YOU WANT ...</div>]

What does it mean?

tr:not(tr:has(td:contains("Less similar results")) ~ *) div.title

Select <div> with class title, that is under <tr> which comes before <tr> that contains <td> with "Less similar results".

Further reading:

CSS Selector Reference

like image 143
Andrej Kesely Avatar answered Dec 01 '25 11:12

Andrej Kesely



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!