Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup, get_text(), splitlines(), how to remove empty lines in pythonic one liner?

Is there a one liner where I can get the text from the soup object, then use splitlines to get a list of each line in the html. Then remove all the excess empty lines in the list that just have newline in it.

I don't want to write another for loop to pass thru twice and clean up the new lines. Also, any other pythonic way to do this is appreciated.

soup = BeautifulSoup('myhtml.html', 'html.parser')
sections = soup.findAll(div, class_='section')
lines = []
for section in sections:
    lines = lines + section.get_text().splitlines()
like image 912
Joseph Luce Avatar asked Nov 15 '25 17:11

Joseph Luce


2 Answers

Try a list comprehension:

lines = lines + [l for l in sections.get_text().splitlines() if l]

Alternatively, a filter:

lines = lines + list(filter(None, sections.get_text().splitlines()))

Furthermore, you may shorten this to

lines += ...

If you want to get rid of the loop even, here's what you do:

lines = [l for section in soup.findAll(div, class_='section')\
              for l in section.get_text().splitlines() if l]
like image 180
cs95 Avatar answered Nov 18 '25 07:11

cs95


Here's a real one-liner :)

from itertools import chain
lines = list(chain.from_iterable([l for l in section.get_text().splitlines() if l] 
                   for section in soup.findAll(div, class_='section')))
like image 23
DYZ Avatar answered Nov 18 '25 06:11

DYZ



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!