Download image from webpage using python

Question

I am trying to write a python script that download an image from a webpage.on the webpage (I am using NASA's picture of the day page), a new picture is posted everyday, with different file names.

so my solutions was to parse the html using HTMLParser, looking for "jpg", and write the path and file name of the image to an attribute (named as "output", see code below) of the HTML parser object.

I am new to python and OOP (this is my first real python script ever), so I am not sure if this is how it is generally done. any advice and pointer is welcome.

here is my code:

# Grab image url
response = urllib2.urlopen('http://apod.nasa.gov/apod/astropix.html')
html = response.read() 

class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
    # Only parse the 'anchor' tag.
    if tag == "a":
       # Check the list of defined attributes.
       for name, value in attrs:
           # If href is defined, print it.
           if name == "href":
               if value[len(value)-3:len(value)]=="jpg":
                   #print value
                   self.output=value #return the path+file name of the image

parser = MyHTMLParser()
parser.feed(html)
imgurl='http://apod.nasa.gov/apod/'+parser.output

jfs · Accepted Answer

To check whether a string ends with "jpg" you could use .endswith() instead of len() and slicing:

if name == "href" and value.endswith("jpg"):
   self.output = value

If the search inside web page is more complex, you could use lxml.html or BeautifulSoup instead of HTMLParser e.g.:

from lxml import html

# download & parse web page
doc = html.parse('http://apod.nasa.gov/apod/astropix.html').getroot()

# find <a href that ends with ".jpg" and 
# that has <img child that has src attribute that also ends with ".jpg"
for elem, attribute, link, _ in doc.iterlinks():
    if (attribute == 'href' and elem.tag == 'a' and link.endswith('.jpg') and
        len(elem) > 0 and elem[0].tag == 'img' and
        elem[0].get('src', '').endswith('.jpg')):
        print(link)

Download image from webpage using python

Tags:

python

html-parsing

web-crawler

Cici

1 Answers

jfs

Recent Activity

Donate For Us

Download image from webpage using python

Tags:

python

html-parsing

web-crawler

Cici

1 Answers

jfs

Related questions

Recent Activity

Donate For Us