I'm learning how to use python and trying to use beautiful soup to do some web scraping. I want to pull the product name and product number from the saved page I'm referencing in my python code, but have provided a snippet of a section where this script is looking. They're located under a div with the class name and a span with the id product_id
Essentially, my python script does put in all the product names, but once it gets to the product_id loop, it overwrites the initial values from my first loop. Looking to see if anyone can point me in the right direction.
OUTPUT
After first loop
{'name': 'ADA Hi-Lo Power Plinth Table'}
{'name': 'Adjustable Headrest Couch - Chrome-Plated Steel Legs'}
{'name': 'Adjustable Headrest Couch - Chrome-Plated Steel Legs (X-Large)'}
After second loop
{'name': 'Weekender Folding Cot', 'product_ID': '55984'}
{'name': 'Weekender Folding Cot', 'product_ID': '31350'}
{'name': 'Weekender Folding Cot', 'product_ID': '31351'}
<div class="revealOnScroll product-item" data-addcart-callback="addcart_callback" data-ajaxcart="1" data-animation="fadeInUp" data-catalogid="1496" data-categoryid="5127" data-timeout="500">
<div class="img">
<a href="ADA-Hi-Lo-Power-Plinth-Table_p_1496.html">
<img alt="ADA Hi-Lo Power Plinth Table" class="img-responsive" src="assets/images/thumbnails/55984_thumbnail.jpg"/>
</a>
<button class="quickview" data-toggle="modal">
Quick View
</button>
</div>
<div class="name">
<a href="ADA-Hi-Lo-Power-Plinth-Table_p_1496.html">
ADA Hi-Lo Power Plinth Table
</a>
</div>
<div class="product-id">
Item Number:
<strong>
<span id="product_id">
55984
</span>
</strong>
</div>
<div class="status">
</div>
<div class="reviews">
</div>
<div class="price">
<span class="regular-price">
$2,849.00
</span>
</div>
<div class="action">
<a class="add-to-cart btn btn-default" href="add_cart.asp?quick=1&item_id=1496&cat_id=5127">
<span class="buyitlink-text">
Select Options
</span>
<span class="ajaxcart-loader icon-spin2 animate-spin">
</span>
<span class="ajaxcart-added icon-ok">
</span>
</a>
</div>
</div>
<div class="revealOnScroll product-item" data-addcart-callback="addcart_callback" data-ajaxcart="1" data-animation="fadeInUp" data-catalogid="2878" data-categoryid="5127" data-timeout="500">
<div class="img">
<a href="Adjustable-Headrest-Couch--Chrome-Plated-Steel-Legs_p_2878.html">
<img alt="Adjustable Headrest Couch - Chrome-Plated Steel Legs" class="img-responsive" src="assets/images/thumbnails/31350_thumbnail.jpg"/>
</a>
<button class="quickview" data-toggle="modal">
Quick View
</button>
</div>
<div class="name">
<a href="Adjustable-Headrest-Couch--Chrome-Plated-Steel-Legs_p_2878.html">
Adjustable Headrest Couch - Chrome-Plated Steel Legs
</a>
</div>
<div class="product-id">
Item Number:
<strong>
<span id="product_id">
31350
</span>
</strong>
</div>
<div class="status">
</div>
<div class="reviews">
</div>
<div class="price">
<span class="regular-price">
$729.00
</span>
</div>
<div class="action">
<a class="add-to-cart btn btn-default" href="add_cart.asp?quick=1&item_id=2878&cat_id=5127">
<span class="buyitlink-text">
Select Options
</span>
<span class="ajaxcart-loader icon-spin2 animate-spin">
</span>
<span class="ajaxcart-added icon-ok">
</span>
</a>
</div>
</div>
<div class="revealOnScroll product-item" data-addcart-callback="addcart_callback" data-ajaxcart="1" data-animation="fadeInUp" data-catalogid="2879" data-categoryid="5127" data-timeout="500">
<div class="img">
<a href="Adjustable-Headrest-Couch--Chrome-Plated-Steel-Legs-X-Large_p_2879.html">
<img alt="Adjustable Headrest Couch - Chrome-Plated Steel Legs (X-Large)" class="img-responsive" src="assets/images/thumbnails/31350_thumbnail.jpg"/>
</a>
<button class="quickview" data-toggle="modal">
Quick View
</button>
</div>
<div class="name">
<a href="Adjustable-Headrest-Couch--Chrome-Plated-Steel-Legs-X-Large_p_2879.html">
Adjustable Headrest Couch - Chrome-Plated Steel Legs (X-Large)
</a>
</div>
<div class="product-id">
Item Number:
<strong>
<span id="product_id">
31351
</span>
</strong>
</div>
<div class="status">
</div>
<div class="reviews">
</div>
<div class="price">
<span class="regular-price">
$769.00
</span>
</div>
<div class="action">
<a class="add-to-cart btn btn-default" href="add_cart.asp?quick=1&item_id=2879&cat_id=5127">
<span class="buyitlink-text">
Select Options
</span>
<span class="ajaxcart-loader icon-spin2 animate-spin">
</span>
<span class="ajaxcart-added icon-ok">
</span>
</a>
</div>
</div>
BEGINNING OF PYTHON SCRIPT
import requests
from bs4 import BeautifulSoup
with open('recoveryCouches','r') as html_file:
content= html_file.read()
soup = BeautifulSoup(content,'lxml')
allProductDivs = soup.find('div', class_='product-items product-items-4')
#get names of products on page
nameDiv = soup.find_all('div',class_='name')
prodID = soup.find_all('span', id='product_id')
records=[]
d=dict()
for name in nameDiv:
d['name'] = name.find('a').text
records.append(d)
print(d)
for productId in prodID:
d['product_ID'] = productId.text
records.append(d)
print(d)
Try this:
nameDiv = soup.find_all('div',class_='name')
prodID = soup.find_all('span', id='product_id')
records=[]
for i in range(len(nameDiv)):
records.append({
"name": nameDiv[i].find('a').text.strip(),
"product_ID": prodID[i].text.strip()
})
to write data to csv file:
import csv
with open("file.csv", 'w') as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=records[0].keys())
writer.writeheader()
for record in records:
writer.writerow(record)
If I understand the question correctly, you're trying to get all the names and productIds and store them. The problem you're running into is, in the dictionary, your values are getting overwritten.
One solution to that problem would be to initialize your python dictionary values as lists, like so:
d = {
'name': [],
'product_ID': []
}
Then in each of the loops, you can append the new value to that array. What you currently have will overwrite the previous value.
for name in nameDiv:
d['name'].append(name.find('a').text)
for productId in prodID:
d['product_ID'].append(productId.text)
This will result in a list of all names and product_IDs stored in that dictionary.
If you want to put these lists together in a format like this:
[(name0, productId0), (name1, productId1), ...]
Then you can make use of zip, which will basically combine your lists as long as they are equal length. For example:
zipped_results = list(zip(d['name'], d['product_ID']))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With