So I'm trying to download files(both images and documents) from a website I have scraped. I have to download these to a specific folder. So far I have:
images = re.findall("/([^/]+\.(?:jpg|gif|png))", html)
output = open("output.txt","a+")
output.write("\n" + f"[+] {len(images)} Images Found:" + "\n")
for images in images:
output.write(images + "\n")
output.write("Beginning file download with urllib2..." + "\n")
imageurl = "images"
urllib.request.urlretrieve(url, "/downloads")
How would I keep the file names the same as it is on the website with the specific file type ect?
This is just a snippit of the code to handle the images only.
You can put the output filename into the urllib.request.urlretrieve.
images = re.findall("/([^/]+\.(?:jpg|gif|png))", html)
output = open("output.txt","a+")
output.write("\n" + f"[+] {len(images)} Images Found:" + "\n")
for images in images:
output.write(images + "\n")
output.write("Beginning file download with urllib2..." + "\n")
imageurl = "images"
urllib.request.urlretrieve(url, "/downloads" + imagename)
[You only have to set the variable to the name of the image. For example image.png]
I hope i could help you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With