Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract data from <script> BeautifulSoup Python

I have this code:

product_url = 'https://www.burton.com/us/en/p/burton-elite-long-sleeve-tshirt/W21-203921.html?cgid=womens-tees'
res = requests.get(product_url, headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
product = soup.find('main', {'id': 'main-content'})
details = product.find('script')
data = json.loads(details.string)

which gives this output:

<script>
        __metadata.product = {
            id: "W21-203921",
            sku: "W21-203921",
            ph1: 'SOFTGOODS',
            ph2: 'BASIC FLEECE AND TEE',
            ph3: 'LS TEES',
            ph4: '',
            upc: '190450612509',
            ean: '9009521451408',
            brand: "Burton",
            category: "womens-tees",
            primaryCategory: "womens-sale-sweaters-shirts",
            currency: "USD",
            gender: "Unisex",
            label: "Burton Elite Long Sleeve T-Shirt",
            name: "Burton Elite Long Sleeve T-Shirt"
        };
        __metadata.criteo = {
            pageType: 'ProductPage'
        };
    </script>

Now I want to extract some of this data like id, brand, category, and name.

I have looked at pretty much every thread on this forum with very similar questions and tried their solution, and nothing ever works. Most of them do something along the lines of data = json.loads(details) in various ways and none of them seems to work. The most common errors I get are:

json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 9)

or

TypeError: the JSON object must be str, bytes or bytearray, not Tag

like image 601
Luka Jozić Avatar asked Jun 14 '26 03:06

Luka Jozić


1 Answers

It'll be far easier and more robust to just get the data from the ajax format. Just add that in to the params parameter. Then you can pull out whatever you want from the json format/dictionary. Works for the other url you provided in the comments too.

import requests

url = 'https://www.burton.com/us/en/p/burton-elite-long-sleeve-tshirt/W21-203921.html?cgid=womens-tees'
payload = {'format':'ajax'}

jsonData = requests.get(url, params=payload).json()

Output:

print(jsonData['data']['products'][0])
{'id': 'W21-203921', 'hideOutOfStockVariants': True, 'brand': 'Burton', 'name': 'Burton Elite Long Sleeve T-Shirt', 'subtitle': '100% Organic Cotton Long Sleeve Graphic T Shirt', 'shortDescription': "A comfortable long sleeve T-shirt that's an unsung favorite for social hour and Sunday in the park.", 'gender': 'Unisex', 'season': 'W21', 'isBoard': False, 'hasSizeChart': True, 'hasSizeFinder': False, 'selectedVariations': {'variationColor': '', 'variationSize': ''}, 'links': {'master': 'https://www.burton.com/us/en/p/burton-elite-long-sleeve-tshirt/W21-203921.html', 'variations': '/on/demandware.store/Sites-Burton_NA-Site/en_US/Product-GetVariationJSON?pid=W21-203921', 'manual': '/us/en/help/manuals.html', 'yotpoAPI': 'https://api.yotpo.com/v1/widget/AbBl1exDWS4rzXsg73rzUKlzUOo10aeMXRkIGHVG/products/W21-203921/reviews?per_page=0', 'tech': '/on/demandware.store/Sites-Burton_NA-Site/en_US/Product-GetTechFeaturesJSON?pids=W21-203921', 'recommendations': '/on/demandware.store/Sites-Burton_NA-Site/en_US/Product-GetRecommendationsJSON?pids=W21-203921', 'ultimateSetup': '/on/demandware.store/Sites-Burton_NA-Site/en_US/Product-GetRecommendationsJSON?pids=W21-203921', 'dynamicslots': '/on/demandware.store/Sites-Burton_NA-Site/en_US/Slot-GetDynamicSlots?pid=W21-203921'}, 'variationValueCount': {'variationColor': 4, 'variationSize': 7}, 'finePrint': [], 'images': {'type': 'PRODUCT_LEVEL', 'views': [{'id': '_4U', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102001_4U.png'}}, {'id': '_3W', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102001_3W.png'}}, {'id': '_4M', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102001_4M.png'}}, {'id': '_5M', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102001_5M.png'}}, {'id': '_6W', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102001_6W.png'}}, {'id': '_1', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102001_1.png'}}], 'variationImageData': [{'variationColorID': '20392102001', 'display': {'category': {'primary': '_4U', 'focus': '_1'}}, 'views': [{'id': '_4U', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102001_4U.png'}}, {'id': '_3W', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102001_3W.png'}}, {'id': '_4M', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102001_4M.png'}}, {'id': '_5M', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102001_5M.png'}}, {'id': '_6W', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102001_6W.png'}}, {'id': '_1', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102001_1.png'}}]}, {'variationColorID': '20392102300', 'display': {'category': {'primary': '_4U', 'focus': '_1'}}, 'views': [{'id': '_4U', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102300_4U.png'}}, {'id': '_3M', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102300_3M.png'}}, {'id': '_4W', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102300_4W.png'}}, {'id': '_5M', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102300_5M.png'}}, {'id': '_6W', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102300_6W.png'}}, {'id': '_1', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392102300_1.png'}}]}, {'variationColorID': '20392103200', 'display': {'category': {'primary': '_4'}}, 'views': [{'id': '_4', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392103200_4.png'}}, {'id': '_3', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392103200_3.png'}}, {'id': '_5', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392103200_5.png'}}, {'id': '_6', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392103200_6.png'}}]}, {'variationColorID': '20392103400', 'display': {'category': {'primary': '_3'}}, 'views': [{'id': '_3', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392103400_3.png'}}, {'id': '_4', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392103400_4.png'}}, {'id': '_5', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392103400_5.png'}}, {'id': '_6', 'active': True, 'type': 'image', 'masterLevel': False, 'template': None, 'format': None, 'url': {'base': 'https://www.burton.com/static/product/W21/20392103400_6.png'}}]}]}, 'ean': '9009521451408', 'upc': '190450612509', 'badges': '', 'category': 'womens-tees', 'primaryCategory': 'womens-sale-sweaters-shirts', 'hasUltimateSetup': False, 'ph1': 'SOFTGOODS', 'ph2': 'BASIC FLEECE AND TEE', 'ph3': 'LS TEES', 'ph4': '', 'videoID': '', 'videoPoster': '', 'videoVertical': '', 'spectrumObjects': False, 'scrollingText': False, 'cartSpecialCalloutMessage': False, 'disableEcommerce': False}

Update:

To get price and stock, you need to pull out the product ID from that first response, then make a new request:

import requests
import pandas as pd

urls = ['https://www.burton.com/us/en/p/burton-elite-long-sleeve-tshirt/W21-203921.html?cgid=womens-tees','https://www.burton.com/us/en/p/girls-burton-chicklet-flat-top-snowboard/W21-107341.html']
payload = {'format':'ajax'}

productID_list = []
for url in urls:
    jsonData = requests.get(url, params=payload).json()
    productID = jsonData['data']['masterID']
    productID_list.append(productID)


stock = []
for productID in productID_list:
    prod_url = 'https://www.burton.com/on/demandware.store/Sites-Burton_NA-Site/en_US/Product-GetVariationJSON'
    payload = {'pid':productID,
               'pricing':''}
    productData = requests.get(prod_url, params=payload).json()
    
    
    for each in productData['data']['variations']['variationValues']:
        row = {}
        row['name'] = each['name']
        row['color'] = each['variationColor']['displayName']
        row['size'] = each['variationSize']['displayName']
        row['standard_price'] = each['price']['standardPriceUnformatted']
        row['sale_price'] = each['price']['salePriceUnformatted']
        row['isOnSale'] = each['price']['isOnSale']
        row['available'] = each['status']['available']
        row['inStock'] = each['status']['meta']['type']
        
        stock.append(row)
    
df = pd.DataFrame(stock)    

Output:

print (df.to_string())
                                         name          color size standard_price sale_price  isOnSale  available        inStock
0            Burton Elite Long Sleeve T-Shirt     True Black    L          39.95                False       True       IN_STOCK
1            Burton Elite Long Sleeve T-Shirt     True Black    M          39.95                False       True       IN_STOCK
2            Burton Elite Long Sleeve T-Shirt     True Black    S          39.95                False       True       IN_STOCK
3            Burton Elite Long Sleeve T-Shirt     True Black   XL          39.95                False       True       IN_STOCK
4            Burton Elite Long Sleeve T-Shirt     True Black   XS          39.95                False       True       IN_STOCK
5            Burton Elite Long Sleeve T-Shirt     True Black  XXL          39.95                False       True       IN_STOCK
6            Burton Elite Long Sleeve T-Shirt     True Black  XXS          39.95                False       True       IN_STOCK
7            Burton Elite Long Sleeve T-Shirt  Martini Olive    L          39.95                False       True       IN_STOCK
8            Burton Elite Long Sleeve T-Shirt  Martini Olive    M          39.95                False       True       IN_STOCK
9            Burton Elite Long Sleeve T-Shirt  Martini Olive    S          39.95                False       True       IN_STOCK
10           Burton Elite Long Sleeve T-Shirt  Martini Olive   XL          39.95                False       True       IN_STOCK
11           Burton Elite Long Sleeve T-Shirt  Martini Olive   XS          39.95                False       True       IN_STOCK
12           Burton Elite Long Sleeve T-Shirt  Martini Olive  XXL          39.95                False       True       IN_STOCK
13           Burton Elite Long Sleeve T-Shirt  Martini Olive  XXS          39.95                False       True       IN_STOCK
14           Burton Elite Long Sleeve T-Shirt     True Penny    L          39.95      27.96      True       True       IN_STOCK
15           Burton Elite Long Sleeve T-Shirt     True Penny    M          39.95      27.96      True       True       IN_STOCK
16           Burton Elite Long Sleeve T-Shirt     True Penny    S          39.95      27.96      True      False      BACKORDER
17           Burton Elite Long Sleeve T-Shirt     True Penny   XL          39.95      27.96      True       True       IN_STOCK
18           Burton Elite Long Sleeve T-Shirt     True Penny   XS          39.95      27.96      True      False  NOT_AVAILABLE
19           Burton Elite Long Sleeve T-Shirt     True Penny  XXL          39.95      27.96      True       True       IN_STOCK
20           Burton Elite Long Sleeve T-Shirt     True Penny  XXS          39.95      27.96      True       True       IN_STOCK
21           Burton Elite Long Sleeve T-Shirt     Lapis Blue    L          39.95      27.96      True      False  NOT_AVAILABLE
22           Burton Elite Long Sleeve T-Shirt     Lapis Blue    M          39.95      27.96      True      False      BACKORDER
23           Burton Elite Long Sleeve T-Shirt     Lapis Blue    S          39.95      27.96      True      False      BACKORDER
24           Burton Elite Long Sleeve T-Shirt     Lapis Blue   XL          39.95      27.96      True      False  NOT_AVAILABLE
25           Burton Elite Long Sleeve T-Shirt     Lapis Blue   XS          39.95      27.96      True       True       IN_STOCK
26           Burton Elite Long Sleeve T-Shirt     Lapis Blue  XXL          39.95      27.96      True      False      BACKORDER
27           Burton Elite Long Sleeve T-Shirt     Lapis Blue  XXS          39.95      27.96      True       True       IN_STOCK
28  Girls' Burton Chicklet Flat Top Snowboard             80   80         199.95                False      False      BACKORDER
29  Girls' Burton Chicklet Flat Top Snowboard             90   90         199.95                False      False      BACKORDER
30  Girls' Burton Chicklet Flat Top Snowboard            100  100         199.95                False      False      BACKORDER
31  Girls' Burton Chicklet Flat Top Snowboard            110  110         199.95                False      False      BACKORDER
32  Girls' Burton Chicklet Flat Top Snowboard            115  115         199.95                False      False      BACKORDER
33  Girls' Burton Chicklet Flat Top Snowboard            120  120         199.95                False       True       IN_STOCK
34  Girls' Burton Chicklet Flat Top Snowboard            125  125         199.95                False       True       IN_STOCK
35  Girls' Burton Chicklet Flat Top Snowboard            130  130         199.95                False       True       IN_STOCK
like image 187
chitown88 Avatar answered Jun 17 '26 14:06

chitown88