BeautifulSoup, get_text(), splitlines(), how to remove empty lines in pythonic one liner?

Question

Is there a one liner where I can get the text from the soup object, then use splitlines to get a list of each line in the html. Then remove all the excess empty lines in the list that just have newline in it.

I don't want to write another for loop to pass thru twice and clean up the new lines. Also, any other pythonic way to do this is appreciated.

soup = BeautifulSoup('myhtml.html', 'html.parser')
sections = soup.findAll(div, class_='section')
lines = []
for section in sections:
    lines = lines + section.get_text().splitlines()

cs95 · Accepted Answer

Try a list comprehension:

lines = lines + [l for l in sections.get_text().splitlines() if l]

Alternatively, a filter:

lines = lines + list(filter(None, sections.get_text().splitlines()))

Furthermore, you may shorten this to

lines += ...

If you want to get rid of the loop even, here's what you do:

lines = [l for section in soup.findAll(div, class_='section')\
              for l in section.get_text().splitlines() if l]

DYZ · Answer

Here's a real one-liner :)

from itertools import chain
lines = list(chain.from_iterable([l for l in section.get_text().splitlines() if l] 
                   for section in soup.findAll(div, class_='section')))

BeautifulSoup, get_text(), splitlines(), how to remove empty lines in pythonic one liner?

Tags:

python

beautifulsoup

Joseph Luce

2 Answers

cs95

DYZ

Recent Activity

Donate For Us

BeautifulSoup, get_text(), splitlines(), how to remove empty lines in pythonic one liner?

Tags:

python

beautifulsoup

Joseph Luce

2 Answers

cs95

DYZ

Related questions

Recent Activity

Donate For Us