I'm trying to use BeautifulSoup on the following:
<h4>Hello<br /></h4>
<p><img src="http://url.goes.here" alt="hiya" class="img" />May 28, 1996</p>
For this example, let's say I have the <h4> tag saved in the variable tag. When I type print tag.text the output is Hello, as expected.
However, when I use print tag.nextSibling the output is nothing. When I type print tag.nextSibling.nextSibling, the output is <p><img src="http://url.goes.here" alt="hiya" class="img" />May 28, 1996</p>. What is going on? Why do I have to double up on the use of .nextSibling to get to the <p> tag in my example? This is consistently an error.
find_next_sibling() function is used to find the succeeding sibling of a tag/element. It only returns the first match next to the tag/element.
A NavigableString object holds the text within an HTML or an XML tag. This is a Python Unicode string with methods for searching and navigation. Sometimes we may need to navigate to other tags or text within an HTML/XML document based on the current text.
Apparently, .nextSibling will grab white text. So in the actual page I'm working with, there is white text between the <h4> and <p> tags, which is why I have to double.
Evidence
Writing:
print tag.__class__
print tag.nextSibling.__class__
print tag.nextSibling.nextSibling.__class__
Yields:
<class 'BeautifulSoup.Tag'>
<class 'BeautifulSoup.NavigableString'>
<class 'BeautifulSoup.Tag'>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With