I really can't manage to figure this out. I parsed the following link with BeautifulSoup and I did this:
soup.find(text='Title').find_parent('h3')
And it does not find anything. If you take a look on the code of the linked page, you'll see a h3 tag which contains the word Titles.
The exact point is:
<h3 class="findSectionHeader"><a name="tt"></a>Titles</h3>
If I make BS parse the line above only, it works perfectly. I tried also with:
soup.find(text='Title').find_parents('h3')
soup.find(text='Title').find_parent(class_='findSectionHeader')
which both work on the line only, but don't work on the entire html.
If I do a soup.find(text='Titles').find_parents('div') it works with the entire html.
Before the findSectionHeader H3 tag, there is another tag with Title in the text:
>>> soup.find(text='Title').parent
<a href="/find?q=batman&s=tt&ref_=fn_tt">Title</a>
You need to be more specific in your search, search for Titles instead, and loop to find the correct one:
>>> soup.find(text='Titles').parent
<option value="tt">Titles</option>
>>> for elem in soup.find_all(text='Titles'):
... parent_h3 = elem.find_parent('h3')
... if parent_h3 is None:
... continue
... print parent_h3
...
<h3 class="findSectionHeader"><a name="tt"></a>Titles</h3>
find(text='...') only matches the full text, not a partial match. Use a regular expression if you need partial matches instead:
>>> import re
>>> soup.find_all(text='Title')
[u'Title']
>>> soup.find_all(text=re.compile('Title'))
[u'Titles', u'Titles', u'Titles', u'Title', u'Advanced Title Search']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With