Using: Delphi 2010, latest version of Indy
I am trying to scrape the data off Googles Adsense web page, with an aim to get the reports. However I have been unsuccessful so far. It stops after the first request and does not proceed.
Using Fiddler to debug the traffic/requests to Google Adsense website, and a web browser to load the Adsense page, I can see that the request (from the webbrowser) generates a number of redirects until the page is loaded.
However, my Delphi application is only generating a couple of requests before it stops.
Here are the steps I have followed:
Finally I have this code:
procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
 Output : TMemoryStream;
begin
  Output := TMemoryStream.Create;
  try
    IdHTTP1.Get(FURL, Output);
    Output.SaveToFile(AFile);
  finally
    Output.Free;
  end;
end;
However, it does not get to the login page as expected. I would expect it to behave as if it was a webbrowser and proceed through the redirects until it finds the final page.
This is the output of the headers from Fiddler:
HTTP/1.1 302 Found Location: https://encrypted.google.com/ Cache-Control: private Content-Type: text/html; charset=UTF-8 Set-Cookie: PREF=ID=5166063f01b64b03:FF=0:TM=1293571783:LM=1293571783:S=a5OtsOqxu_GiV3d6; expires=Thu, 27-Dec-2012 21:29:43 GMT; path=/; domain=.google.com Set-Cookie: NID=42=XFUwZdkyF0TJKmoJjqoGgYNtGyOz-Irvz7ivao2z0--pCBKPpAvCGUeaa5GXLneP41wlpse-yU5UuC57pBfMkv434t7XB1H68ET0ZgVDNEPNmIVEQRVj7AA1Lnvv2Aez; expires=Wed, 29-Jun-2011 21:29:43 GMT; path=/; domain=.google.com; HttpOnly Date: Tue, 28 Dec 2010 21:29:43 GMT Server: gws Content-Length: 226 X-XSS-Protection: 1; mode=block
Firstly, is there anything wrong with this output?
Is there something more that I should do to get the IdHTTP component to keep pursuing the redirects until the final page?
IdHTTP component property values prior to making the call:
    Name := 'IdHTTP1';
    IOHandler := IdSSLIOHandlerSocketOpenSSL1;
    AllowCookies := True;
    HandleRedirects := True;
    RedirectMaximum := 35;
    Request.UserAgent := 
      'Mozilla/5.0 (Windows NT 5.1; rv:2.0b8) Gecko/20100101 Firefox/4.' +
      '0b8';
    HTTPOptions := [hoForceEncodeParams];
    OnRedirect := IdHTTP1Redirect;
    CookieManager := IdCookieManager1;
Redirect event handler:
procedure TfmMain.IdHTTP1Redirect(Sender: TObject; var dest: string; var
    NumRedirect: Integer; var Handled: Boolean; var VMethod: string);
begin
   Handled := True;
end;
Making the call:
  FURL := 'https://www.google.com';
  GetUrlToFile( (FURL + '/adsense/'), 'a.html');
  procedure TfmMain.GetUrlToFile(AURL, AFile : String);
  var
   Output : TMemoryStream;
  begin
    Output := TMemoryStream.Create;
    try
      try
       IdHTTP1.Get(AURL, Output);
       IdHTTP1.Disconnect;
      except
      end;
      Output.SaveToFile(AFile);
    finally
      Output.Free;
    end;
  end;
Here's the (request and response headers) output from Fiddler:

TIdHTTP.HandleRedirects := True so it starts automatically handling redirects.
TIdHTTP.RedirectMaximum  is used to set how many successive redirects should be handled.
Alternatively you may assign TIdHTTP.OnRedirect and set Handled := True from that handler. This is what I'm doing in a project that needs to read data from a WikiMedia web site (my own site).
Nothing wrong with that response, it's a very basic redirect to https://encrypted.google.com/. TIdHTTP should go to the given page in response. It also sets some cookies.
Don't forget to assign an CookieManager and make sure you use the same CookieManager for all subsequent requests. If you don't you'll probably get redirected to the login page over and over again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With