Using Python and Regex,How do you remove ^{tags from html? [duplicate]}

Question

Using python regex, how do i remove all tags in html? The tags sometimes have styling, such as below:

<sup style="vertical-align:top;line-height:120%;font-size:7pt">(1)</sup>

I would like to remove everything between and including the sup tags in a larger string of html.

alecxe · Accepted Answer

I would use an HTML Parser instead (why). For example, BeautifulSoup and unwrap() can handle your beautiful sup:

Tag.unwrap() is the opposite of wrap(). It replaces a tag with whatever’s inside that tag. It’s good for stripping out markup.

from bs4 import BeautifulSoup

data = """
<div>
    <sup style="vertical-align:top;line-height:120%;font-size:7pt">(1)</sup>
</div>
"""

soup = BeautifulSoup(data)
for sup in soup.find_all('sup'):
    sup.unwrap()

print soup.prettify()

Prints:

<div>
(1)
</div>

Using Python and Regex,How do you remove <sup> tags from html? [duplicate]

Tags:

python

html

regex

user2634569

1 Answers

alecxe

Recent Activity

Donate For Us