Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using beautifulsoup get_text()

I can parse the field that I need from a website with this code block:

response = requests.get(index_url)
soup = bs4.BeautifulSoup(response.text, "lxml")
poem = soup.select('div.siir p[id^=siir]')
print poem

But it prints with HTML tags. I'm trying to use get_text() function. When I try to use like this:

print poem.get_text()

I get this error:

AttributeError: 'list' object has no attribute 'get_text'

I also tried to use like this:

poem = soup.select('div.siir p[id^=siir]').get_text()

I get same error again. How can I eliminate the HTML tags after I parse the correct field?

like image 952
JayGatsby Avatar asked Oct 19 '25 10:10

JayGatsby


1 Answers

soup.select() always returns a list of elements, not just one element. Call get_text() on each element in turn:

for element in poem:
    print element.get_text()

If you expected just one element, then extract it with indexing:

print poem[0].get_text()
like image 80
Martijn Pieters Avatar answered Oct 20 '25 23:10

Martijn Pieters



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!