Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace headers by bold in HTML with python [duplicate]

I have some HTML text like this one:

'<H1 LANG="es-ES" CLASS="western" STYLE="text-indent: -0.5cm; line-height: 100%"><FONT FACE="Arial, sans-serif"><FONT SIZE=3>some_text_here</FONT></FONT></H1>'

within a larger HTML text. I want to automatically identify all such headers and change them to simple bold text:

'<B LANG="es-ES" CLASS="western" STYLE="text-indent: -0.5cm; line-height: 100%"><FONT FACE="Arial, sans-serif"><FONT SIZE=3>some_text_here</FONT></FONT></B>'

Using regular expressions are not the best because sometimes the header start and end are on different lines.

like image 956
Vladimir Vargas Avatar asked Jan 20 '26 22:01

Vladimir Vargas


1 Answers

You can use BeautifulSoup but a easy way is use re.sub() in the following form:

   html_content = re.sub("<H\d", "<B", html_content)
   html_content = re.sub("<\/H\d>", "</B>", html_content)
like image 78
Nicolas Parra Avila Avatar answered Jan 22 '26 12:01

Nicolas Parra Avila