Is there a one liner where I can get the text from the soup object, then use splitlines to get a list of each line in the html. Then remove all the excess empty lines in the list that just have newline in it.
I don't want to write another for loop to pass thru twice and clean up the new lines. Also, any other pythonic way to do this is appreciated.
soup = BeautifulSoup('myhtml.html', 'html.parser')
sections = soup.findAll(div, class_='section')
lines = []
for section in sections:
lines = lines + section.get_text().splitlines()
Try a list comprehension:
lines = lines + [l for l in sections.get_text().splitlines() if l]
Alternatively, a filter:
lines = lines + list(filter(None, sections.get_text().splitlines()))
Furthermore, you may shorten this to
lines += ...
If you want to get rid of the loop even, here's what you do:
lines = [l for section in soup.findAll(div, class_='section')\
for l in section.get_text().splitlines() if l]
Here's a real one-liner :)
from itertools import chain
lines = list(chain.from_iterable([l for l in section.get_text().splitlines() if l]
for section in soup.findAll(div, class_='section')))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With