I am encountering an error when trying to get a comments' http from reddit. This has happened to various URLs (not all of them with special characters) and this is one of them. In one hour time frame, there may be 1000 or more requests to the reddit.com domain.
hdr = {"User-Agent": "My Agent"}
try:
     req = urllib2.Request("http://www.reddit.com/r/gaming/"
           "comments/1bjuee/when_pokΓ©mon_was_good", headers=hdr)
     htmlSource = urllib2.urlopen(req).read()
except Exception as inst:
     print inst
Output>>HTTP Error 504: Gateway Time-out
HTTP Error 504 Gateway timeout - A server (not necessarily a Web server) is acting as a gateway or proxy to fulfil the request by the client (e.g. your Web browser or our CheckUpDown robot) to access the requested URL. This server did not receive a timely response from an upstream server it accessed to deal with your HTTP request.
This usually means that the upstream server is down (no response to the gateway/proxy), rather than that the upstream server and the gateway/proxy do not agree on the protocol for exchanging data.
Problem can appear in different places on the network and there is no "unique" solution for it. You will have to investigative the problem by your own.
Your code works fine. Possible solution for you problem would be:
import urllib2
hdr = {"User-Agent": "My Agent"}
while True:
    try:
        req = urllib2.Request("http://www.reddit.com/", headers=hdr)
        response = urllib2.urlopen(req)
        htmlSource = response.read()
        if response.getcode() == 200:
            break
    except Exception as inst:
        print inst
This code will try to request webpage until it gets 200 response (standard response for successful HTTP requests). When 200 response will occur while loop will break and you can do next request (or whatever you have in your program)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With