Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parse html tags, based on a class and href tag using beautiful soup

I am trying to parse HTML with BeautifulSoup.

The content I want is like this:

<a class="yil-biz-ttl" id="yil_biz_ttl-2" href="http://some-web-url/" title="some title">Title</a> 

i tried and got the following error:

maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
------------------------------------------------------------
   File "<ipython console>", line 1
     maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
                                             ^
SyntaxError: invalid syntax

what i want is the string : http://some-web-url/

like image 411
whatf Avatar asked Oct 26 '25 10:10

whatf


2 Answers

soup.findAll('a', {'class': 'yil-biz-ttl'})[0]['href']

To find all such links:

for link in soup.findAll('a', {'class': 'yil-biz-ttl'}):
    try:
        print link['href']
    except KeyError:
        pass
like image 191
infrared Avatar answered Oct 29 '25 01:10

infrared


You're missing a close-quote after "class:

 maxx = soup.findAll("href", {"class: "yil-biz-ttl"})

should be

 maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

also, I don't think you can search for an attribute like href like that, I think you need to search for a tag:

 maxx = [link['href'] for link in soup.findAll("a", {"class": "yil-biz-ttl"})]
like image 24
agf Avatar answered Oct 29 '25 02:10

agf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!