Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python requests returns 403 even with headers

I'm trying to get content of website but my requests return me an 403 ERROR.

After searching, I found Network>Headers section to add headers before GET request and tried these headers.

from bs4 import BeautifulSoup as bs
import requests
url = "https://clutch.co/us/agencies/digital-marketing"
HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"} 
### Also tried "Referer" , "sec-ch-ua-platform" and "Origin" headers but nothing changed.
html = requests.get(url,headers=HEADERS)
print("RESULT:",html)

But result didn't change.

like image 752
320V Avatar asked Jun 27 '26 02:06

320V


1 Answers

You can try to load the page from the Google cache instead directly:

import requests
from bs4 import BeautifulSoup


url = "https://clutch.co/us/agencies/digital-marketing"
cache_URL = "https://webcache.googleusercontent.com/search?q=cache:"


def get_data(link):
    hdr = {
        "User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Mobile Safari/537.36"
    }
    req = requests.get(cache_URL + link, headers=hdr)
    content = req.text
    return content


soup = BeautifulSoup(get_data(url), 'html.parser')
for h3 in soup.select('h3.company_info'):
    print(h3.get_text(strip=True))

Prints:

WebFX
Ignite Visibility
SmartSites
Thrive Internet Marketing Agency
Lilo Social
NEWMEDIA.COM
Funnel Boost Media
Direct Online Marketing
SeedX Inc.
Impactable

...
like image 92
Andrej Kesely Avatar answered Jun 29 '26 15:06

Andrej Kesely



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!