I know this question has been answered before in this thread, but I couldn't seem to find the details.
In my scenario, I am building a console application which will keep an eye on html page source for any changes. If any update/change occurs, I will perform further operations. Moreover, I'll also perform a request after every 1 second, or as soon as the previous request finishes.
I can't seem to figure out should I use HttpWebRequest or WebClient for downloading the html page source and perform comparison? What do you think would be an ideal solution in my case? Speed and reliability both :)
I'd go with HttpWebRequst because it's not as abstracted and lets you fiddle with HTTP params quite a bit. It gives you the option to not download the entire page if the server returns "file not changed", for example.
If you add some parameters to your request like IfModifiedSince (it might be HEAD or GET request) the server may return the response code 304 - NOT MODIFIED. Refer to description of caching in HTTP for further explanation.
The point is to make sure that you only download the full page when it's actually modified since the last time you fetched it. Most of the time it won't be changed (I suppose, can't know for sure without knowing your domain), so you only need to get a lightweight response from server which simply states "nothing changed here".
Update: code sample demonstrating the use of IfModifiedSince property:
bool IsResourceModified(string url, DateTime dateTime) {            
    try {
        var request = (HttpWebRequest)HttpWebRequest.Create(new Uri(url));
        request.IfModifiedSince = dateTime;
        request.Method = "HEAD";
        var response = (HttpWebResponse)request.GetResponse();
        return true;
    }
    catch(WebException ex) {
        if(ex.Status != WebExceptionStatus.ProtocolError)
            throw;
        var response = (HttpWebResponse)ex.Response;
        if(response.StatusCode != HttpStatusCode.NotModified)
            throw;
        return false;    
    }
}
This method should return true if the page was modifed since the dateTime date and false if it wasn't. GetResponse method will throw a WebException if you make a HEAD-request and the server returns 304 - NOT MODIFIED (which is kinda unfortunate). We have to make sure that it's not some other web connection problem, that's why I check the status of web exception and the HTTP status in response. If anything else caused an exception we just throw it further.
Console.WriteLine(IsResourceModified("http://example.com", new DateTime(2009)));
Console.WriteLine(IsResourceModified("http://example.com", DateTime.Now));
This sample code produces the output:
True
False
Note: make sure to read Jim Mischel's addition to this answer as he gives few good advices on this technique.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With