Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't get HTML code through HttpWebRequest

Tags:

c#

I am trying to parse the HTML code of the page at http://odds.bestbetting.com/horse-racing/today in order to have a list of races, etc. The problem is I am not being able to retrieve the HTML code of the page. Here is the C# code of the function:

    public static string Http(string url) {          
            Uri myUri = new Uri(url);
            // Create a 'HttpWebRequest' object for the specified url. 
            HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(myUri);
            myHttpWebRequest.AllowAutoRedirect = true;
            // Send the request and wait for response.
            HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
            var stream = myHttpWebResponse.GetResponseStream();
            var reader = new StreamReader(stream);
            var html = reader.ReadToEnd();
            // Release resources of response object.
            myHttpWebResponse.Close();

            return html;
    }

When I execute the program calling the function it throws an exception on

HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();

which is:

Cannot handle redirect from HTTP/HTTPS protocols to other dissimilar ones.

I have read this question but I don't seem to have the same problem. I've also tried iguring something out sniffing the traffic with fiddler but can't see anything to where it redirects or something similar. I just have extracted these two possible redirections: odds.bestbetting.com/horse-racing/2011-06-10/byCourse and odds.bestbetting.com/horse-racing/2011-06-10/byTime , but querying them produces the same result as above.

It's not the first time I do something like this, but I'm really lost on this one. Any help?

Thanks!

like image 546
Jacobo Polavieja Avatar asked Jan 18 '26 09:01

Jacobo Polavieja


1 Answers

I finally found the solution... it effectively was a problem with the headers, specifically the User-Agent one.

I found after lots of searching a guy having the same problem as me with the same site. Although his code was different the important bit was that he set the UserAgent attribute of the request manually to that of a browser. I think I had done this before but I may had done it pretty bad... sorry.

The final code if it is of interest to any one is this:

    public static string Http(string url) {
        if (url.Length > 0)
        {
            Uri myUri = new Uri(url);
            // Create a 'HttpWebRequest' object for the specified url. 
            HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(myUri);
            // Set the user agent as if we were a web browser
            myHttpWebRequest.UserAgent = @"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4";

            HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
            var stream = myHttpWebResponse.GetResponseStream();
            var reader = new StreamReader(stream);
            var html = reader.ReadToEnd();
            // Release resources of response object.
            myHttpWebResponse.Close();

            return html;
        }
        else { return "NO URL"; }
    }

Thank you very much for helping.

like image 59
Jacobo Polavieja Avatar answered Jan 19 '26 22:01

Jacobo Polavieja



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!