AttributeError: 'str' object has no attribute 'descendants' error when using BeautifulSoup

Question

@ayivima has a great answer down there, but I should add that the website itself ended up being not scraped properly by BeautifulSoup as it had a ton of Javascript.

So I'm utterly new at using Python, and I'm just trying to print the title of a webpage. I'm using this code mostly from google:

from bs4 import BeautifulSoup, SoupStrainer
import requests

url = "https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3210001601"
page = requests.get(url)
data = page.text
soup = BeautifulSoup
soup.find_all('h1')

print(text)

And I keep getting the error:

AttributeError: 'str' object has no attribute 'descendants'

and I honestly don't really have a clue as to what it means, the only other answer I can find is from: AttributeError: 'str' object has no attribute 'descendants' which I don't think applies to me?

Anything I'm doing wrong in the code? (A lot, probably, but I mean mostly for this error)

ayivima · Accepted Answer

BeautifulSoup requires that an html parser, and the html text is passed as attributes. Technically, you need to create an instance of BeautifulSoup. If you don't pass the html text, there will be nothing to search through.

So the line soup = BeautifulSoup must become something like this:

soup = BeautifulSoup(data, 'html.parser')

where the first parameter, in this case data refers to the raw html text, and the second parameter is the parser, html.parser. I am using the default python html parser, but python supports other parsers aside. Find out more here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/.

RECOMMENDED CODE:

from bs4 import BeautifulSoup, SoupStrainer
import requests

url = "https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3210001601"
page = requests.get(url)
data = page.text
soup = BeautifulSoup(data, 'html.parser')
text = soup.find_all('h1')

print(text)

Output:

[]

It appears BeautifulSoup didn't find any h1 tag.

Let's experiment with meta tags:

meta_tags = soup.find_all('meta')
print(meta_tags)

Output:

[<meta content="no-cache" http-equiv="Pragma"/>, 
<meta content="-1" http-equiv="Expires"/>, 
<meta content="no-cache" http-equiv="CacheControl"/>]

AttributeError: 'str' object has no attribute 'descendants' error when using BeautifulSoup

Tags:

python

python-3.x

beautifulsoup

facsasd

1 Answers

ayivima

Recent Activity

Donate For Us

AttributeError: 'str' object has no attribute 'descendants' error when using BeautifulSoup

Tags:

python

python-3.x

beautifulsoup

facsasd

1 Answers

ayivima

Related questions

Recent Activity

Donate For Us