I'm using beautiful soup. There is a tag like this:
<li><a href="example"> s.r.o., <small>small</small></a></li> 
I want to get the text within the anchor <a> tag only, without any from the <small> tag in the output; i.e. " s.r.o., "
I tried find('li').text[0] but it does not work.
Is there a command in BS4 which can do that?
One option would be to get the first element from the contents of the a element:
>>> from bs4 import BeautifulSoup
>>> data = '<li><a href="example"> s.r.o., <small>small</small></a></li>'
>>> soup = BeautifulSoup(data)
>>> print soup.find('a').contents[0]
 s.r.o., 
Another one would be to find the small tag and get the previous sibling:
>>> print soup.find('small').previous_sibling
 s.r.o., 
Well, there are all sorts of alternative/crazy options also:
>>> print next(soup.find('a').descendants)
 s.r.o., 
>>> print next(iter(soup.find('a')))
 s.r.o., 
Use .children
soup.find('a').children.next()
s.r.o.,
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With