Using beautifulsoup get_text()

Question

I can parse the field that I need from a website with this code block:

response = requests.get(index_url)
soup = bs4.BeautifulSoup(response.text, "lxml")
poem = soup.select('div.siir p[id^=siir]')
print poem

But it prints with HTML tags. I'm trying to use get_text() function. When I try to use like this:

print poem.get_text()

I get this error:

AttributeError: 'list' object has no attribute 'get_text'

I also tried to use like this:

poem = soup.select('div.siir p[id^=siir]').get_text()

I get same error again. How can I eliminate the HTML tags after I parse the correct field?

Martijn Pieters · Accepted Answer

soup.select() always returns a list of elements, not just one element. Call get_text() on each element in turn:

for element in poem:
    print element.get_text()

If you expected just one element, then extract it with indexing:

print poem[0].get_text()

Donate For Us