Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Python and Regex,How do you remove <sup> tags from html? [duplicate]

Tags:

python

html

regex

Using python regex, how do i remove all tags in html? The tags sometimes have styling, such as below:

<sup style="vertical-align:top;line-height:120%;font-size:7pt">(1)</sup>

I would like to remove everything between and including the sup tags in a larger string of html.

like image 481
user2634569 Avatar asked Oct 25 '25 18:10

user2634569


1 Answers

I would use an HTML Parser instead (why). For example, BeautifulSoup and unwrap() can handle your beautiful sup:

Tag.unwrap() is the opposite of wrap(). It replaces a tag with whatever’s inside that tag. It’s good for stripping out markup.

from bs4 import BeautifulSoup

data = """
<div>
    <sup style="vertical-align:top;line-height:120%;font-size:7pt">(1)</sup>
</div>
"""

soup = BeautifulSoup(data)
for sup in soup.find_all('sup'):
    sup.unwrap()

print soup.prettify()

Prints:

<div>
(1)
</div>
like image 171
alecxe Avatar answered Oct 27 '25 08:10

alecxe