Using python regex, how do i remove all tags in html? The tags sometimes have styling, such as below:
<sup style="vertical-align:top;line-height:120%;font-size:7pt">(1)</sup>
I would like to remove everything between and including the sup tags in a larger string of html.
I would use an HTML Parser instead (why). For example, BeautifulSoup and unwrap() can handle your beautiful sup:
Tag.unwrap() is the opposite of wrap(). It replaces a tag with whatever’s inside that tag. It’s good for stripping out markup.
from bs4 import BeautifulSoup
data = """
<div>
<sup style="vertical-align:top;line-height:120%;font-size:7pt">(1)</sup>
</div>
"""
soup = BeautifulSoup(data)
for sup in soup.find_all('sup'):
sup.unwrap()
print soup.prettify()
Prints:
<div>
(1)
</div>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With