Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you remove html tags using Universal Feed Parser?

The documentation lists the tags that are allowed/removed by default:

http://www.feedparser.org/docs/html-sanitization.html

But it doesn't say anything about how you can specify which additional tags you want removed.

Is there a way to do this using Universal Feed Parser or do you have to do further processing using your own regex and/or something like Beautiful Soup?

like image 934
rick Avatar asked Nov 24 '25 06:11

rick


1 Answers

i took a quick look over the code and i don't think there is a way to overwrite them directly. But you can overwrite feedparser._HTMLSanitizer.acceptable_elements, the list of tags that wont get removed before doing feedparser.parse

like image 90
Jochen Ritzel Avatar answered Nov 26 '25 21:11

Jochen Ritzel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!