I am wondering how I can delete all HTML tags along with their contents using BeautifulSoup.
Input:
... text <strong>ha</strong> ... text
Output:
... text ... text
Use replace_with() (or replaceWith()):
from bs4 import BeautifulSoup, Tag
text = "text <strong>ha</strong> ... text"
soup = BeautifulSoup(text)
for tag in soup.find_all('strong'):
tag.replaceWith('')
print soup.get_text()
prints:
text ... text
Or, as @mata suggested, you can use tag.decompose() instead of tag.replaceWith('') - will produce the same result, but looks more appropriate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With