Let's say we have a html doc like this:
<html>
<head>
<title>title</title>
</head>
<body>
<div class="c1">division
    <p>
    passage in division
    <b>bold in passage </b>
    </p>
</div>
</body>
</html>
I need to prepend a word "cool " to each string (or NavigableString in bs4 terminology) in the html doc.
I have tried to walk through every element and check if it has any children, if not then edit the string. This is inaccurate, besides, the editing didn't take any effect.
You can find all text nodes in the document by calling find_all() with text=True argument. Use replace_with() to replace the text nodes with the modified text:
from bs4 import BeautifulSoup
html = """
<html>
<head>
<title>title</title>
</head>
<body>
<div class="c1">division
    <p>
    passage in division
    <b>bold in passage </b>
    </p>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html)
for element in soup.find_all(text=True):
    text = element.string.strip()
    if text:
        element.replace_with("cool " + text)
print soup.prettify()
Prints:
<html>
 <head>
  <title>
   cool title
  </title>
 </head>
 <body>
  <div class="c1">
   cool division
   <p>
    cool passage in division
    <b>
     cool bold in passage
    </b>
   </p>
  </div>
 </body>
</html>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With