Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python LXML Getting Data from Steam Bundle Page - List out of the index error

I am working on python program that after it gets ID of steam bundle - it returns the current price.

Program is using requests and lxml.

There are two paths for the final price:

  1. /html/body/div[1]/div[7]/div[4]/div[1]/div[2]/div/div[2]/div[10]/div[3]/div
  2. //*[@id="game_area_purchase"]/div/div/div/div[1]/div/div/div[2]

Using example: https://store.steampowered.com/bundle/16140

Here's a code:

import requests
import lxml.html
    
#example URL for steam bundle    
URL = "https://store.steampowered.com/bundle/16140"
    
html = requests.get(URL)
doc = lxml.html.fromstring(html.content)
    
#xpath to price location    
price = doc.xpath('/html/body/div[1]/div[7]/div[4]/div[1]/div[2]/div/div[2]/div[10]/div[3]/div/text()')
    
print(price)

Program returns this:

[]

or this

Traceback (most recent call last):
  File <path-to-program>, line 9, in <module>
    price = doc.xpath('/html/body/div[1]/div[7]/div[4]/div[1]/div[2]/div/div[2]/div[10]/div[3]/div/text()')[0]
IndexError: list index out of range

I get an error for both options. What should I do to fix it?

like image 937
ernikus Avatar asked Nov 19 '25 07:11

ernikus


1 Answers

To get required page HTML you need to add request with birthtime cookie that "tells" server that your age allows you to visit page with sexual/nudity content:

import requests
import lxml.html
    
URL = "https://store.steampowered.com/bundle/16140"
session = requests.Session()
r1 = session.get(URL)
r1.cookies['birthtime']='439423201'  # this is date in seconds since "epoch" (January 1, 1970)
r2 = session.get(URL, cookies=r1.cookies)

doc = lxml.html.fromstring(r2.content)
print(doc.xpath('//div[contains(@class, "discount_final_price")]/text()')[0])
like image 134
JaSON Avatar answered Nov 21 '25 19:11

JaSON



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!