Using BeautifulSoup to grab all the HTML between two tags

Question

I have some HTML that looks like this:

<h1>Title</h1>

//a random amount of p/uls or tagless text

<h1> Next Title</h1>

I want to copy all of the HTML from the first h1, to the next h1. How can I do this?

Hunting · Accepted Answer

This is the clear BeautifulSoup way, when the second h1 tag is a sibling of the first:

html = u""
for tag in soup.find("h1").next_siblings:
    if tag.name == "h1":
        break
    else:
        html += unicode(tag)

maltman · Answer

I have the same problem. Not sure if there is a better solution, but what I've done is use regular expressions to get the indices of the two nodes that I'm looking for. Once I have that, I extract the HTML between the two indexes and create a new BeautifulSoup object.

Example:

m = re.search(r'<h1>Title</h1>.*?<h1>', html, re.DOTALL)
s = m.start()
e = m.end() - len('<h1>')
target_html = html[s:e]
new_bs = BeautifulSoup(target_html)

Using BeautifulSoup to grab all the HTML between two tags

Tags:

python

html

beautifulsoup

passsky

2 Answers

Hunting

maltman

Recent Activity

Donate For Us

Using BeautifulSoup to grab all the HTML between two tags

Tags:

python

html

beautifulsoup

passsky

2 Answers

Hunting

maltman

Related questions

Recent Activity

Donate For Us