Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Image download mime type validation python requests

I use the requests library in python to download a large number of image files via http. I convert the received content to raw bytes using BytesIO in python and then use Pillow() to save this raw content as a jpeg file.

from PIL import Image
from io import BytesIO

rsp = requests.get(imageurl)
content_type_received = rsp.headers['Content-Type'] # mime type
binarycontent = BytesIO(rsp.content)
if content_type_received.startswith('image'): # image/jpeg, image/png etc
    i = Image.open(binarycontent)
    outfilename = os.path.join(outfolder,'myimg'+'.jpg')
    with open(outfilename, 'wb') as f:
        f.write(rsp.content)
    rsp.close()

What is the potential security risk of this code? (I am not sure how much we can trust the server saying mime type in the response header is really what the server says it is?) Is there a better way to write a secure download routine?

like image 774
hAcKnRoCk Avatar asked Oct 26 '25 07:10

hAcKnRoCk


1 Answers

The potential security risk of your code depends on how much you trust the server your contacting. If you're sure that the server will never try to fool you with some malicious content, then you're relatively safe to use that piece of code. Otherwise, check for the content-type by yourself. The biggest potential risk might to unknowingly save an executable rather than an image. A smaller one might be to store a different kind of content that may crash PIL or another component in your application.

Keep in mind that the server is free to choose whatever value it wants for any response headers, including the content-type. If you have any reason to believe the server you're contacting might not be honest about it, you shouldn't trust request headers.

If you want a more reliable way to determine the content type of the content you received, I suggest you take a look at python-magic, a wrapper for libmagic. This library will help you determine yourself the content type, so you don't have to "trust" the server you're downloading from.

# ...
content = BytesIO(rsp.content)
mime = magic.from_buffer(content.read(1024), mime=True)
if mime.startswith('image'):
    content.seek(0) # Reset the bytes stream position because you read from it
    # ...

python-magic is very well documented, so I recommend you have a look at their README if you consider user it.

like image 105
Alvae Avatar answered Oct 28 '25 00:10

Alvae