Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

catch exception in BeautifulSoup.findAll function

I'm trying to scrape this afghanistan page by extracting the cities and area codes in the table. Now, When I try to scrape this american-samoa page, findAll() cannot find <td> which is true. How to catch this exception?

Here's my code:

from bs4 import BeautifulSoup                                                                                                                                                                                                                
import urllib2                                                                                                                                                                                                                               
import re                                                                                                                                                                                                                                    

url = "http://www.howtocallabroad.com/american-samoa"
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)

areatable = soup.find('table',{'id':'codes'})
d = {}

def chunks(l, n):
    return [l[i:i+n] for i in range(0, len(l), n)]

li = dict(chunks([i.text for i in areatable.findAll('td')], 2))
if li != []:
    print li

    for key in li:
            print key, ":", li[key]
else:
    print "list is empty"

This is the error i got

Traceback (most recent call last):
  File "extract_table.py", line 15, in <module>
    li = dict(chunks([i.text for i in areatable.findAll('td')], 2))
AttributeError: 'NoneType' object has no attribute 'findAll'

I also tried this but doesn't work too

def gettdtag(tag):
    return "empty" if areatable.findAll(tag) is None else tag

all_td = gettdtag('td')
print all_td
like image 594
chip Avatar asked Oct 15 '25 08:10

chip


1 Answers

The error says that areatable is None:

areatable = soup.find('table',{'id':'codes'})
#areatable = soup.find('table', id='codes')  # Also works

if areatable is None:
    print 'Something happened'
    # Exit out

Also, I'd use find_all instead of findAll and get_text() instead of text.

like image 117
Blender Avatar answered Oct 17 '25 22:10

Blender