Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup not finding parents

I really can't manage to figure this out. I parsed the following link with BeautifulSoup and I did this:

soup.find(text='Title').find_parent('h3')

And it does not find anything. If you take a look on the code of the linked page, you'll see a h3 tag which contains the word Titles. The exact point is:

<h3 class="findSectionHeader"><a name="tt"></a>Titles</h3>

If I make BS parse the line above only, it works perfectly. I tried also with:

soup.find(text='Title').find_parents('h3')
soup.find(text='Title').find_parent(class_='findSectionHeader')

which both work on the line only, but don't work on the entire html.

If I do a soup.find(text='Titles').find_parents('div') it works with the entire html.

like image 645
whatyouhide Avatar asked Dec 14 '25 15:12

whatyouhide


1 Answers

Before the findSectionHeader H3 tag, there is another tag with Title in the text:

>>> soup.find(text='Title').parent
<a href="/find?q=batman&amp;s=tt&amp;ref_=fn_tt">Title</a>

You need to be more specific in your search, search for Titles instead, and loop to find the correct one:

>>> soup.find(text='Titles').parent
<option value="tt">Titles</option>
>>> for elem in soup.find_all(text='Titles'):
...     parent_h3 = elem.find_parent('h3')
...     if parent_h3 is None:
...         continue
...     print parent_h3
... 
<h3 class="findSectionHeader"><a name="tt"></a>Titles</h3>

find(text='...') only matches the full text, not a partial match. Use a regular expression if you need partial matches instead:

>>> import re
>>> soup.find_all(text='Title')
[u'Title']
>>> soup.find_all(text=re.compile('Title'))
[u'Titles', u'Titles', u'Titles', u'Title', u'Advanced Title Search']
like image 132
Martijn Pieters Avatar answered Dec 19 '25 05:12

Martijn Pieters



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!