Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding multiple loop outputs to single dictionary

I'm learning how to use python and trying to use beautiful soup to do some web scraping. I want to pull the product name and product number from the saved page I'm referencing in my python code, but have provided a snippet of a section where this script is looking. They're located under a div with the class name and a span with the id product_id

Essentially, my python script does put in all the product names, but once it gets to the product_id loop, it overwrites the initial values from my first loop. Looking to see if anyone can point me in the right direction.

OUTPUT
After first loop
    {'name': 'ADA Hi-Lo Power Plinth Table'}
    {'name': 'Adjustable Headrest Couch - Chrome-Plated Steel Legs'}
    {'name': 'Adjustable Headrest Couch - Chrome-Plated Steel Legs (X-Large)'}

After second loop
    {'name': 'Weekender Folding Cot', 'product_ID': '55984'}
    {'name': 'Weekender Folding Cot', 'product_ID': '31350'}
    {'name': 'Weekender Folding Cot', 'product_ID': '31351'}

<div class="revealOnScroll product-item" data-addcart-callback="addcart_callback" data-ajaxcart="1" data-animation="fadeInUp" data-catalogid="1496" data-categoryid="5127" data-timeout="500">
         <div class="img">
          <a href="ADA-Hi-Lo-Power-Plinth-Table_p_1496.html">
           <img alt="ADA Hi-Lo Power Plinth Table" class="img-responsive" src="assets/images/thumbnails/55984_thumbnail.jpg"/>
          </a>
          <button class="quickview" data-toggle="modal">
           Quick View
          </button>
         </div>
         <div class="name">
          <a href="ADA-Hi-Lo-Power-Plinth-Table_p_1496.html">
           ADA Hi-Lo Power Plinth Table
          </a>
         </div>
         <div class="product-id">
          Item Number:
          <strong>
           <span id="product_id">
            55984
           </span>
          </strong>
         </div>
         <div class="status">
         </div>
         <div class="reviews">
         </div>
         <div class="price">
          <span class="regular-price">
           $2,849.00
          </span>
         </div>
         <div class="action">
          <a class="add-to-cart btn btn-default" href="add_cart.asp?quick=1&amp;item_id=1496&amp;cat_id=5127">
           <span class="buyitlink-text">
            Select Options
           </span>
           <span class="ajaxcart-loader icon-spin2 animate-spin">
           </span>
           <span class="ajaxcart-added icon-ok">
           </span>
          </a>
         </div>
        </div>
        <div class="revealOnScroll product-item" data-addcart-callback="addcart_callback" data-ajaxcart="1" data-animation="fadeInUp" data-catalogid="2878" data-categoryid="5127" data-timeout="500">
         <div class="img">
          <a href="Adjustable-Headrest-Couch--Chrome-Plated-Steel-Legs_p_2878.html">
           <img alt="Adjustable Headrest Couch - Chrome-Plated Steel Legs" class="img-responsive" src="assets/images/thumbnails/31350_thumbnail.jpg"/>
          </a>
          <button class="quickview" data-toggle="modal">
           Quick View
          </button>
         </div>
         <div class="name">
          <a href="Adjustable-Headrest-Couch--Chrome-Plated-Steel-Legs_p_2878.html">
           Adjustable Headrest Couch - Chrome-Plated Steel Legs
          </a>
         </div>
         <div class="product-id">
          Item Number:
          <strong>
           <span id="product_id">
            31350
           </span>
          </strong>
         </div>
         <div class="status">
         </div>
         <div class="reviews">
         </div>
         <div class="price">
          <span class="regular-price">
           $729.00
          </span>
         </div>
         <div class="action">
          <a class="add-to-cart btn btn-default" href="add_cart.asp?quick=1&amp;item_id=2878&amp;cat_id=5127">
           <span class="buyitlink-text">
            Select Options
           </span>
           <span class="ajaxcart-loader icon-spin2 animate-spin">
           </span>
           <span class="ajaxcart-added icon-ok">
           </span>
          </a>
         </div>
        </div>
        <div class="revealOnScroll product-item" data-addcart-callback="addcart_callback" data-ajaxcart="1" data-animation="fadeInUp" data-catalogid="2879" data-categoryid="5127" data-timeout="500">
         <div class="img">
          <a href="Adjustable-Headrest-Couch--Chrome-Plated-Steel-Legs-X-Large_p_2879.html">
           <img alt="Adjustable Headrest Couch - Chrome-Plated Steel Legs (X-Large)" class="img-responsive" src="assets/images/thumbnails/31350_thumbnail.jpg"/>
          </a>
          <button class="quickview" data-toggle="modal">
           Quick View
          </button>
         </div>
         <div class="name">
          <a href="Adjustable-Headrest-Couch--Chrome-Plated-Steel-Legs-X-Large_p_2879.html">
           Adjustable Headrest Couch - Chrome-Plated Steel Legs (X-Large)
          </a>
         </div>
         <div class="product-id">
          Item Number:
          <strong>
           <span id="product_id">
            31351
           </span>
          </strong>
         </div>
         <div class="status">
         </div>
         <div class="reviews">
         </div>
         <div class="price">
          <span class="regular-price">
           $769.00
          </span>
         </div>
         <div class="action">
          <a class="add-to-cart btn btn-default" href="add_cart.asp?quick=1&amp;item_id=2879&amp;cat_id=5127">
           <span class="buyitlink-text">
            Select Options
           </span>
           <span class="ajaxcart-loader icon-spin2 animate-spin">
           </span>
           <span class="ajaxcart-added icon-ok">
           </span>
          </a>
         </div>
        </div>

BEGINNING OF PYTHON SCRIPT

import requests
from bs4 import BeautifulSoup


with open('recoveryCouches','r') as html_file:
    content= html_file.read()

    soup = BeautifulSoup(content,'lxml')
    allProductDivs = soup.find('div', class_='product-items product-items-4')
   
    #get names of products on page
    nameDiv = soup.find_all('div',class_='name')
    prodID = soup.find_all('span', id='product_id')
    records=[]
    d=dict()

    for name in nameDiv:
        d['name'] = name.find('a').text
        records.append(d)
        print(d)

    for productId in prodID:
        d['product_ID'] = productId.text
        records.append(d)
        print(d)
like image 467
ch11nV11n Avatar asked Mar 10 '26 19:03

ch11nV11n


2 Answers

Try this:

nameDiv = soup.find_all('div',class_='name')
prodID = soup.find_all('span', id='product_id')
records=[]

for i in range(len(nameDiv)):
    records.append({
        "name": nameDiv[i].find('a').text.strip(), 
        "product_ID": prodID[i].text.strip()
        })

to write data to csv file:

import csv

with open("file.csv", 'w') as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames=records[0].keys())

    writer.writeheader()
    for record in records:
        writer.writerow(record)
like image 176
dimay Avatar answered Mar 13 '26 12:03

dimay


If I understand the question correctly, you're trying to get all the names and productIds and store them. The problem you're running into is, in the dictionary, your values are getting overwritten.

One solution to that problem would be to initialize your python dictionary values as lists, like so:

d = {
  'name': [],
  'product_ID': []
}

Then in each of the loops, you can append the new value to that array. What you currently have will overwrite the previous value.

for name in nameDiv:
    d['name'].append(name.find('a').text)

for productId in prodID:
    d['product_ID'].append(productId.text)

This will result in a list of all names and product_IDs stored in that dictionary.

If you want to put these lists together in a format like this:

[(name0, productId0), (name1, productId1), ...]

Then you can make use of zip, which will basically combine your lists as long as they are equal length. For example:

zipped_results = list(zip(d['name'], d['product_ID']))

like image 28
shadow-kris Avatar answered Mar 13 '26 10:03

shadow-kris