Suppose there is some variable fragment html code
<p>
<span class="code"> string 1 </ span>
<span class="code"> string 2 </ span>
<span class="code"> string 3 </ span>
</ p>
<p>
<span class="any"> Some text </ span>
</ p>
I need to modify the contents of all the tags with the class code <span>
skipping content through some function, such as foo
, which returns the contents of the modified tag <span>
. Ultimately, I should get a new piece of html document like this:
<p>
<span class="code"> modify string 1 </ span>
<span class="code"> modify string 2 </ span>
<span class="code"> modify string 3 </ span>
</ p>
<p>
<span class="any"> Some text </ span>
</ p>
I have been suggested that the search for the specific html nodes can be easy using the python library BeautifulSoup4. How to perform a modification of content <span class="code">
and save a new version as a new file ? I guess to find you need to use soup.find_all ('span', class = re.compile ("code"))
, only this function returns a list
( copy) of the sample objects , modification of which does not change the contents of soup. How do I solve this problem?
</ span>
is invalid HTML and not even a web browser's lenient parser will parse it properly.
Once you fix your HTML, you can use .replaceWith()
:
from bs4 import BeautifulSoup
soup = BeautifulSoup('''
<p>
<span class="code"> string 1 </span>
<span class="code"> string 2 </span>
<span class="code"> string 3 </span>
</p>
<p>
<span class="any"> Some text </span>
</p>
''', 'html5lib')
for span in soup.find_all('span', class_='code'):
span.string.replaceWith('modified ' + span.string)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With