<div class="info">
<h3> Height:
<span>1.1</span>
</h3>
</div>
<div class="info">
<h3> Number:
<span>111111111</span>
</h3>
</div>
This is a partial portion of the site. Ultimately, I want to extract the 111111111. I know I can do
soup.find_all("div", { "class" : "info" })
to get a list of both divs; however, I would prefer to not have to perform a loop to check if it contains the text "Number".
Is there a more elegant way to extract "1111111" so that it does soup.find_all("div", { "class" : "info" }), but also makes it so that it MUST contain "Number" within?
I also tried numberSoup = soup.find('h3', text='Number')
but it returns None
You can write your own filter function and let it be the argument of function find_all.
from bs4 import BeautifulSoup
def number_span(tag):
return tag.name=='span' and 'Number:' in tag.parent.contents[0]
soup = BeautifulSoup(html, 'html.parser')
tags = soup.find_all(number_span)
By the way, the reason you can't fetch tags with the text param is: text param helps us find tags whose .string value equal to its value. And if a tag contains more than one thing then it is not clear what .string should refer to. So .string is defined to be None.
You can reference to beautiful soup doc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With