Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why HttpClient is not allowed for auto redirect?

I am trying to crawl some domains with different user-agents. My crawler works fins, the problem happens when a domain does not have an SSL certificate and is insecure, in that case, I do not get any response with HttpClient. To skip that I use HttpHandler and set the certificate myself. With this solution I get 301 for all those domains, it feels like my AllowAutoRedirect is false however it is not. I tried and assigned MaxAutomaticRedirections to 5, that did not work as well.

Here is my code:

public Task<int> Crawl(string userAgent, string url)
{
    var handler = new HttpClientHandler();
    handler.ClientCertificateOptions = ClientCertificateOption.Manual;
    handler.ServerCertificateCustomValidationCallback =
        (httpRequestMessage, cert, cetChain, policyErrors) =>
    {
        return true;
    };

    var httpClient = new HttpClient(handler);

    httpClient.DefaultRequestHeaders.Add("User-Agent", userAgent);


    var statusCode = (int)(await httpClient.SendAsync(new HttpRequestMessage(HttpMethod.Get, URL))).StatusCode;

    return statusCode;
}
like image 697
Sajjad Mortazavi Avatar asked Dec 06 '25 03:12

Sajjad Mortazavi


1 Answers

Domains that I was trying to crawl did not have any SSL certificates and HttpClient was redirected to the HTTP version. My guess is HttpClient did not have any clue where it was redirected to, so just did not continue.

My problem got solved by crawling HTTP version of domains, for example: http://example.com

like image 83
Sajjad Mortazavi Avatar answered Dec 08 '25 17:12

Sajjad Mortazavi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!