Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get around a 404 with mechanize

I'm creating a Python script that would read a file of URLs, but I know not all of them will work. I'm trying to figure out how to get around this and make it read the next line of the file, instead of raising the error that I have posted below. I know I need some kind of if statement but I can't quite figure it out.

from mechanize import Browser
from BeautifulSoup import BeautifulSoup
import csv

me = open('C:\Python27\myfile.csv')
reader = csv.reader(me)
mech = Browser()

for url in me:
    response =  mech.open(url)
    html = page.read()
    soup = BeautifulSoup(html)
    table = soup.find("table", border=3)

for row in table.findAll('tr')[2:]:
    col = row.findAll('td')
    BusinessName = col[0].string
    Phone = col[1].string
    Address = col[2].string
    City = col[3].string
    State = col[4].string
    Zip = col[5].string
    Restaurantinfo = (BusinessName, Phone, Address, City, State)
    print "|".join(Restaurantinfo)

When I run that block of code it raises this error:

httperror_seek_wrapper: HTTP Error 404: Not Found

Basically what I am asking for is how to make Python ignore that and try the next URL.

like image 496
Trook2007 Avatar asked Jan 17 '26 20:01

Trook2007


1 Answers

if you only have url in your file maybe it would be more simple to write one url per line and use some code like this:

from mechanize import Browser
from BeautifulSoup import BeautifulSoup


me = open('C:\Python27\myfile.csv')
mech = Browser()

for url in me.readlines():
    ...

if you want to keep your code, you have to use :

for url in reader:
    ...
like image 70
NicoFromFrance Avatar answered Jan 19 '26 18:01

NicoFromFrance



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!