HTML Parsing issue with BeautifulSoup Library

Question

I am working with the BS library for HTML parsing. My task is to remove everything between the head tags. So if i have <head> A lot of Crap! </head> then the result should be <head></head>. This is the code for it

raw_html = "entire_web_document_as_string"
soup = BeautifulSoup(raw_html)
head = soup.head
head.unwrap()
print(head)

And this works fine. But i want that these changes should take place in the raw_html string that contains the entire html document. How do reflect these commands in the original string and not only in the head string? Can you share a code snippet for doing it?

Jivan · Accepted Answer

You're basically asking how to export a string of HTML from BS's soup object.

You can do it this way:

# Python 2.7
modified_raw_html = unicode(soup)

# Python3
modified_raw_html = str(soup)

HTML Parsing issue with BeautifulSoup Library

Tags:

python

html

parsing

html-parsing

beautifulsoup

hnvasa

1 Answers

Jivan

Recent Activity

Donate For Us

HTML Parsing issue with BeautifulSoup Library

Tags:

python

html

parsing

html-parsing

beautifulsoup

hnvasa

1 Answers

Jivan

Related questions

Recent Activity

Donate For Us