I am a bit confused: all tags have a decompose() method which allows to remove the tag from the tree in place. But what if I want to remove a NavigableString? It doesn't have such method:
>>> b = BeautifulSoup('<p>aaaa <span> bbbbb </span> ccccc</p>', 'html.parser')
>>> b.p.contents[0]
'aaaa '
>>> type(b.p.contents[0])
<class 'bs4.element.NavigableString'>
>>> b.p.contents[0].decompose()
Traceback (most recent call last):
...
AttributeError: 'NavigableString' object has no attribute 'decompose'
There's a way I managed to somewhat remove the NavigableString from the tree: by removing it from the content list:
>>> b.p.contents.pop(0)
'aaaa '
>>> b
<p><span> bbbbb </span> ccccc</p>
The problem is that it is still present in the strings method response:
>>> list(b.strings)
['aaaa ', ' bbbbb ', ' ccccc']
Which shows that it was wrong way to do. Besides, I am using strings in my code so this hacky solution is not acceptable, alas.
So the question is: how can I remove the specific NavigableString object from the tree?
A NavigableString object holds the text within an HTML or an XML tag. This is a Python Unicode string with methods for searching and navigation. Sometimes we may need to navigate to other tags or text within an HTML/XML document based on the current text.
To convert a Tag object to a string in Beautiful Soup, simply use str(Tag) .
Beautiful Soup (bs4) is a Python web scraping library for pulling the data from HTML and XML files.
string” with tag. You can replace the string with another string but you can't edit the existing string.
Use extract() instead of decompose()
extract() removes a tag or string from the tree.
decompose() removes a tag from the tree.
b = BeautifulSoup('<p>aaaa <span> bbbbb </span> ccccc</p>', 'html.parser')
b.p.contents[0].extract()
print(b)
To Know more about it please check following link where you will find more details. BeautifulSoup
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With