I would like to get the data located on this page: https://www.zacks.com/stock/quote/MA
I've tried to do this with Beautiful Soup in Python but I get an error: "[WinError 10054] An existing connection was forcibly closed by the remote host".
Can someone guide me?
from bs4 import BeautifulSoup
import urllib
import re
import urllib.request
url = 'https://www.zacks.com/stock/quote/MA'
r = urllib.request.urlopen(url).read()
soup = BeautifulSoup(r, "lxml")
soup
Thanks!
The website is blocking your request, maybe the host allowed no requests without a request header. You can try to simulate a "real" request with the Selenium package.
This is working:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
options = Options()
options.set_headless(headless=True)
url = 'https://www.zacks.com/stock/quote/MA'
browser = webdriver.Firefox(firefox_options=options)
browser.get(url)
html_source = browser.page_source
soup = BeautifulSoup(html_source, "lxml")
print(soup)
browser.close()
Your page is blocking the user-agent python, the user agent is basically "who is doing the request" install the python module fake user-agent and add a header to the request simulating that the request is being made for another one like google chrome, mozilla, etc if you want an specific user-agent i recomend you look at fake-user-agent
With urllib i don't know how you add a header (probably will be with a flag) but i let you here a simple code using the module requests:
import requests
from fake_useragent import UserAgent
ua = UserAgent()
header = {
"User-Agent": ua.random
}
r = requests.get('https://www.zacks.com/stock/quote/MA', headers=header)
r.text #your html code
After this you can use beatifull soup with r.text like you did:
soup = BeautifulSoup(r.text, "lxml")
soup
EDIT:
Looking a bit if you want do it with urllib you can do this:
import urllib
from fake_useragent import UserAgent
ua = UserAgent()
q = urllib.Request('https://www.zacks.com/stock/quote/MA')
q.add_header('User-Agent', ua.random)
a = urlopen(q).read()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With