I would like to download a file using urllib and decompress the file in memory before saving.
This is what I have right now:
response = urllib2.urlopen(baseURL + filename) compressedFile = StringIO.StringIO() compressedFile.write(response.read()) decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') outfile = open(outFilePath, 'w') outfile.write(decompressedFile.read()) This ends up writing empty files. How can I achieve what I'm after?
Updated Answer:
#! /usr/bin/env python2 import urllib2 import StringIO import gzip baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/" # check filename: it may change over time, due to new updates filename = "man-pages-5.00.tar.gz" outFilePath = filename[:-3] response = urllib2.urlopen(baseURL + filename) compressedFile = StringIO.StringIO(response.read()) decompressedFile = gzip.GzipFile(fileobj=compressedFile) with open(outFilePath, 'w') as outfile: outfile.write(decompressedFile.read())
gzip compressed files often have the . gz file extension (in fact, I don't think I've ever seen a . gzip extension), but it's generally unsafe to rely on file extension to test for the type of file anyhow. The c 'library' gzip, ie gzopen/gzread/etc will transparently read uncompressed files.
You need to seek to the beginning of compressedFile after writing to it but before passing it to gzip.GzipFile(). Otherwise it will be read from the end by gzip module and will appear as an empty file to it. See below:
#! /usr/bin/env python import urllib2 import StringIO import gzip baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/" filename = "man-pages-3.34.tar.gz" outFilePath = "man-pages-3.34.tar" response = urllib2.urlopen(baseURL + filename) compressedFile = StringIO.StringIO() compressedFile.write(response.read()) # # Set the file's current position to the beginning # of the file so that gzip.GzipFile can read # its contents from the top. # compressedFile.seek(0) decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') with open(outFilePath, 'w') as outfile: outfile.write(decompressedFile.read())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With