Python BeautifulSoup find element that contains text

Question

<div class="info">
       <h3> Height:
            <span>1.1</span>
       </h3>
</div>

<div class="info">
       <h3> Number:
            <span>111111111</span>
       </h3>
</div>

This is a partial portion of the site. Ultimately, I want to extract the 111111111. I know I can do soup.find_all("div", { "class" : "info" }) to get a list of both divs; however, I would prefer to not have to perform a loop to check if it contains the text "Number".

Is there a more elegant way to extract "1111111" so that it does soup.find_all("div", { "class" : "info" }), but also makes it so that it MUST contain "Number" within?

I also tried numberSoup = soup.find('h3', text='Number') but it returns None

dokelung · Accepted Answer

You can write your own filter function and let it be the argument of function find_all.

from bs4 import BeautifulSoup

def number_span(tag):
    return tag.name=='span' and 'Number:' in tag.parent.contents[0]

soup = BeautifulSoup(html, 'html.parser')
tags = soup.find_all(number_span)

By the way, the reason you can't fetch tags with the text param is: text param helps us find tags whose .string value equal to its value. And if a tag contains more than one thing then it is not clear what .string should refer to. So .string is defined to be None.

You can reference to beautiful soup doc.

Python BeautifulSoup find element that contains text

Tags:

python

beautifulsoup

lclankyo

1 Answers

dokelung

Recent Activity

Donate For Us

Python BeautifulSoup find element that contains text

Tags:

python

beautifulsoup

lclankyo

1 Answers

dokelung

Related questions

Recent Activity

Donate For Us