Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download image from webpage using python

I am trying to write a python script that download an image from a webpage.on the webpage (I am using NASA's picture of the day page), a new picture is posted everyday, with different file names.

so my solutions was to parse the html using HTMLParser, looking for "jpg", and write the path and file name of the image to an attribute (named as "output", see code below) of the HTML parser object.

I am new to python and OOP (this is my first real python script ever), so I am not sure if this is how it is generally done. any advice and pointer is welcome.

here is my code:

# Grab image url
response = urllib2.urlopen('http://apod.nasa.gov/apod/astropix.html')
html = response.read() 

class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
    # Only parse the 'anchor' tag.
    if tag == "a":
       # Check the list of defined attributes.
       for name, value in attrs:
           # If href is defined, print it.
           if name == "href":
               if value[len(value)-3:len(value)]=="jpg":
                   #print value
                   self.output=value #return the path+file name of the image

parser = MyHTMLParser()
parser.feed(html)
imgurl='http://apod.nasa.gov/apod/'+parser.output
like image 569
Cici Avatar asked Apr 22 '26 21:04

Cici


1 Answers

To check whether a string ends with "jpg" you could use .endswith() instead of len() and slicing:

if name == "href" and value.endswith("jpg"):
   self.output = value

If the search inside web page is more complex, you could use lxml.html or BeautifulSoup instead of HTMLParser e.g.:

from lxml import html

# download & parse web page
doc = html.parse('http://apod.nasa.gov/apod/astropix.html').getroot()

# find <a href that ends with ".jpg" and 
# that has <img child that has src attribute that also ends with ".jpg"
for elem, attribute, link, _ in doc.iterlinks():
    if (attribute == 'href' and elem.tag == 'a' and link.endswith('.jpg') and
        len(elem) > 0 and elem[0].tag == 'img' and
        elem[0].get('src', '').endswith('.jpg')):
        print(link)
like image 88
jfs Avatar answered Apr 24 '26 09:04

jfs



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!