I need to scrape the main image from a product page of amazon. I stored the ASIN into a list and i build every single product page with a for loop. i'm trying to scrape the images but i can't. I try with this code:
#declare a session object
session = HTMLSession()
#ignore warnings
if not sys.warnoptions:
warnings.simplefilter("ignore")
urls = ['https://www.amazon.it/gp/bestsellers/apparel/', 'https://www.amazon.it/gp/bestsellers/electronics/', 'https://www.amazon.it/gp/bestsellers/books/']
asins = []
for url in urls:
content = requests.get(url).content
decoded_content = content.decode()
asins = re.findall(r'/[^/]+/dp/([^\"?]+)', decoded_content)
#The ASIN Number will be between the dp/ and another /
for asin in asins:
site = 'https://www.amazon.it/'
start = 'dp/'
end = '/'
url = site + start + asin + end
resp1 = requests.get(url).content
soup = bsoup(resp1, "html.parser")
body = soup.find("body")
imgtag = soup.find("img", {"id":"landingImage"})
imageurl = dict(imgtag.attrs)["src"]
resp2 = request.urlopen(imaegurl)
The problem is that the images are loaded dinamically; inspecting the page, and thanks to the BeautifulSoup documentation, I was able to scrape all the images needed, given a product.
I have a class in which store data, so I save the page information in the instance...
import urllib
from bs4 import BeautifulSoup
def take_page(self, url_page):
req = urllib.request.Request(
url_page,
data=None
)
f = urllib.request.urlopen(req)
page = f.read().decode('utf-8')
self.page = page
The following simple method will return the first image, in the smallest size
import json
def take_image(self):
soup = BeautifulSoup(self.page, 'html.parser')
img_div = soup.find(id="imgTagWrapperId")
imgs_str = img_div.img.get('data-a-dynamic-image') # a string in Json format
# convert to a dictionary
imgs_dict = json.loads(imgs_str)
#each key in the dictionary is a link of an image, and the value shows the size (print all the dictionay to inspect)
num_element = 0
first_link = list(imgs_dict.keys())[num_element]
return first_link
So, you can apply these methods to your needs, I think that this is all you need to improve your code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With