I'm starting to learn Python and I've decided to code a simple scraper. One problem I'm encountering is I cannot convert a NavigableString to a regular string.
Using BeautifulSoup4 and Python 3.5.1. Should I just bite the bullet and go to an earlier version of Python and BeautifulSoup? Or is there a way I can code my own function to cast a NavigableString to a regular unicode string?
for tag in soup.find_all("span"):
    for child in tag.children:
        if "name" in tag.string: #triggers error, can't compare string to NavigableString/bytes
            return child
    #things i've tried:
    #if "name" in str(tag.string)
    #if "name" in unicode(tag.string) #not in 3.5?
    #if "name" in strring(tag.string, "utf-8")
    #tried regex, didn't work. Again, doesn't like NavigableSTring type. 
    #... bunch of other stuff too!
if you have spaces in your markup in between nodes BeautifulSoup will turn those into NavigableString 's. So if you use the index of the contents to grab nodes, you might grab a NavigableString instead of the next Tag. To avoid this, query for the node you are looking for: Searching the Parse Tree
string attribute is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. If a tag has only one child, and that child is a NavigableString, the child can be accessed using .string.
string attribute is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. If a tag has only one child, and that child is a NavigableString, the child can be accessed using.string.
So if you use the index of the contents to grab nodes, you might grab a NavigableString instead of the next Tag. To avoid this, query for the node you are looking for: Searching the Parse Tree
... the answer is merely str(tag.string)
Other answers will fail.
unicode() is not a built-in in Python 3.
tag.string.encode('utf-8') will convert the string to a byte string, which you don't want..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With