Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python web scraping Zacks website error: [WinError 10054] An existing connection was forcibly closed by the remote host

I would like to get the data located on this page: https://www.zacks.com/stock/quote/MA

I've tried to do this with Beautiful Soup in Python but I get an error: "[WinError 10054] An existing connection was forcibly closed by the remote host".

Can someone guide me?

from bs4 import BeautifulSoup
import urllib
import re
import urllib.request

url = 'https://www.zacks.com/stock/quote/MA'

r = urllib.request.urlopen(url).read()
soup = BeautifulSoup(r, "lxml")
soup

Thanks!

like image 447
Vasconni Avatar asked Dec 07 '25 05:12

Vasconni


2 Answers

The website is blocking your request, maybe the host allowed no requests without a request header. You can try to simulate a "real" request with the Selenium package.

This is working:

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup




options = Options()

options.set_headless(headless=True)

url = 'https://www.zacks.com/stock/quote/MA'

browser = webdriver.Firefox(firefox_options=options)

browser.get(url)

html_source = browser.page_source

soup = BeautifulSoup(html_source, "lxml")

print(soup)

browser.close()
like image 200
madik_atma Avatar answered Dec 08 '25 19:12

madik_atma


Your page is blocking the user-agent python, the user agent is basically "who is doing the request" install the python module fake user-agent and add a header to the request simulating that the request is being made for another one like google chrome, mozilla, etc if you want an specific user-agent i recomend you look at fake-user-agent

With urllib i don't know how you add a header (probably will be with a flag) but i let you here a simple code using the module requests:

import requests
from fake_useragent import UserAgent

ua = UserAgent()
header = {
    "User-Agent": ua.random
}
r = requests.get('https://www.zacks.com/stock/quote/MA', headers=header)
r.text #your html code

After this you can use beatifull soup with r.text like you did:

soup = BeautifulSoup(r.text, "lxml")
soup

EDIT:

Looking a bit if you want do it with urllib you can do this:

 import urllib
 from fake_useragent import UserAgent

 ua = UserAgent()
 q = urllib.Request('https://www.zacks.com/stock/quote/MA')
 q.add_header('User-Agent', ua.random)
 a = urlopen(q).read()
like image 28
Jaume Garcia Sanchez Avatar answered Dec 08 '25 17:12

Jaume Garcia Sanchez