I have an ASPX page at https://searchlight.cluen.com/E5/CandidateSearch.aspx with a form on it, that I'd like to submit and parse for information.
Using Python's urllib and urllib2 I created a post request with the proper headers and user agent. But the resulting html response does not contain the expected table of results. Am I misunderstanding or am I missing any obvious details?
    import urllib
    import urllib2
    headers = {
        'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13)         Gecko/2009073022 Firefox/3.0.13',
        'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8',
        'Content-Type': 'application/x-www-form-urlencoded'
    }
    # obtained these values from viewing the source of https://searchlight.cluen.com/E5/CandidateSearch.aspx
    viewstate = '/wEPDwULLTE3NTc4MzQwNDIPZBYCAg ... uJRWDs/6Ks1FECco='
    eventvalidation = '/wEWjQMC8pat6g4C77jgxg0CzoqI8wgC3uWinQQCwr/ ... oPKYVeb74='
    url = 'https://searchlight.cluen.com/E5/CandidateSearch.aspx'
    formData = (
        ('__VIEWSTATE', viewstate),
        ('__EVENTVALIDATION', eventvalidation),
        ('__EVENTTARGET',''),
        ('__EVENTARGUMENT',''),
        ('textcity',''),
        ('dropdownlistposition',''),
        ('dropdownlistdepartment',''),
        ('dropdownlistorderby',''),
        ('textsearch',''),
    )
    # change user agent
    from urllib import FancyURLopener
    class MyOpener(FancyURLopener):
        version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127         Firefox/2.0.0.11'
    myopener = MyOpener()
    # encode form data in post-request format
    encodedFields = urllib.urlencode(formData)
    f = myopener.open(url, encodedFields)
    print f.info()
    try:
      fout = open('tmp.htm', 'w')
    except:
      print('Could not open output file\n')
    fout.writelines(f.readlines())
    fout.close()
There are several questions on this topic that were helpful (such as how to submit query to .aspx page in python) but I'm stuck on this and asking for additional help, if that is possible.
The resulting html page is saying I may need to log in, but the aspx page displays in my browser without any login.
Here are the results from info():
Connection: close Date: Tue, 07 Jun 2011 17:05:26 GMT Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET X-AspNet-Version: 2.0.50727 Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Length: 1944
ASP.Net uses a security feature that protects against tampering with the ViewState by embedding specific information in it.
More than likely, the server is rejecting your request because the ViewState is being treated as though it were tampered with. I can't say this with absolute certainty, but ASP.Net has several security features that are built in to the framework that may be preventing a direct post.
If session is involved at all, then you will also need to take that into account. To simulate what the browser is doing you will need to perform the following steps:
A lot of work I know, but not too awfully difficult. Again, this may not be the sole source of your problems, but it is worth reading up on in order to start troubleshooting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With