Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A lot of whitespace beautifulsoup

I am doing web scraping using beautifulsoup. The web page has the following source:

<td>\n<a href="http://aaa.com">Charles</a>\r\n                         (hello)\r\n                            </td>,
<td>\n<a href="http://bbb.com">Diane</a>\r\n                           (hi)\r\n                            </td>,
<td>\n<a href="http://ccc.com">Kevin</a>\r\n                           (how are you doing)\r\n                            </td>

I use the following codes to print two values. They work just fine.

for item in soup.find_all("td"):
    print item.find('a').text
    print item.find('a').next_sibling

The problem is when I save the outputs in a csv file, the second column has no value. It appears because there is a lot of whitespace. Any suggestion? Thanks in advance.

like image 205
kevin Avatar asked Dec 09 '25 09:12

kevin


1 Answers

Find all the next text siblings, join them and strip:

"".join(item.find('a').find_next_siblings(text=True)).strip()
like image 68
alecxe Avatar answered Dec 11 '25 21:12

alecxe