Python requests returns 403 even with headers

Question

I'm trying to get content of website but my requests return me an 403 ERROR.

After searching, I found Network>Headers section to add headers before GET request and tried these headers.

from bs4 import BeautifulSoup as bs
import requests
url = "https://clutch.co/us/agencies/digital-marketing"
HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"} 
### Also tried "Referer" , "sec-ch-ua-platform" and "Origin" headers but nothing changed.
html = requests.get(url,headers=HEADERS)
print("RESULT:",html)

But result didn't change.

Andrej Kesely · Accepted Answer

You can try to load the page from the Google cache instead directly:

import requests
from bs4 import BeautifulSoup


url = "https://clutch.co/us/agencies/digital-marketing"
cache_URL = "https://webcache.googleusercontent.com/search?q=cache:"


def get_data(link):
    hdr = {
        "User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Mobile Safari/537.36"
    }
    req = requests.get(cache_URL + link, headers=hdr)
    content = req.text
    return content


soup = BeautifulSoup(get_data(url), 'html.parser')
for h3 in soup.select('h3.company_info'):
    print(h3.get_text(strip=True))

Prints:

WebFX
Ignite Visibility
SmartSites
Thrive Internet Marketing Agency
Lilo Social
NEWMEDIA.COM
Funnel Boost Media
Direct Online Marketing
SeedX Inc.
Impactable

...

Python requests returns 403 even with headers

Tags:

python

python-requests

320V

1 Answers

Andrej Kesely

Recent Activity

Donate For Us

Python requests returns 403 even with headers

Tags:

python

python-requests

320V

1 Answers

Andrej Kesely

Related questions

Recent Activity

Donate For Us