I have following html:
<td class="section">
    <div style="margin-top:2px; margin-bottom:-10px; ">
    <span class="username"><a href="user.php?id=xx">xxUsername</a></span>
    </div>
    <br>
<span class="comment">
A test comment
</span>
</td>
All I want to retrieve xxUsername and comment text within SPAN tag. So far I have done this:
results = soup.findAll("td", {"class" : "section"})
It does fetches ALL html blocks of the pattern I mentioned above. Now I want to retrieve all children value within a single loop? Is it possible? If not then how do I fetch child nodes information?
You could try something like this. It basically does what you did above - first iterates through all section-classed td's and then iterates through all span text within. This prints out the class, just in case you needed to be more restrictive:
In [1]: from bs4 import BeautifulSoup
In [2]: html = # Your html here
In [3]: soup = BeautifulSoup(html)
In [4]: for td in soup.find_all('td', {'class': 'section'}):
   ...:     for span in td.find_all('span'):
   ...:         print span.attrs['class'], span.text
   ...:         
['username'] xxUsername
['comment'] 
A test comment
Or with a more-convoluted-than-necessary one-liner that will store everything back in your list:
In [5]: results = [span.text for td in soup.find_all('td', {'class': 'section'}) for span in td.find_all('span')]
In [6]: results
Out[6]: [u'xxUsername', u'\nA test comment\n']
Or on that same theme, a dictionary with the keys being a tuple of the classes and the values being the text itself:
In [8]: results = dict((tuple(span.attrs['class']), span.text) for td in soup.find_all('td', {'class': 'section'}) for span in td.find_all('span'))
In [9]: results
Out[9]: {('comment',): u'\nA test comment\n', ('username',): u'xxUsername'}
Assuming this one is bit closer to what you want, I would suggest rewriting as:
In [10]: results = {}
In [11]: for td in soup.find_all('td', {'class': 'section'}):
   ....:     for span in td.find_all('span'):
   ....:         results[tuple(span.attrs['class'])] = span.text
   ....:         
In [12]: results
Out[12]: {('comment',): u'\nA test comment\n', ('username',): u'xxUsername'}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With