Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CFHTTP Encoding Problem

I am trying to pull a page for parsing information out of it using cfhttp. The page headers that I am calling are:

Content-Encoding: gzip

Connection: Keep-Alive

Content-Length: 19066

Server: IBM_HTTP_Server

Vary: Accept-Encoding, User-Agent

Content-Language: en-US

Cache-Control: no-cache="set-cookie,

set-cookie2"

Content-Type:

text/html;charset=ISO-8859-1

I set the charset to ISO-8859-1 however I am getting the following in the FileContent (only a small sample is shown below but I think it gets to point across).

EðÑq·Oã?·Ì\ZóL¯þ´Vú5ðbä£ÿæ¾_HÉÒñQãO\Çþãë85ÁÜ à±°ùÖ}&bßý?,u?2SùQyk5g?UÛ3Ѹfã×ARÃi_iûRã _ òCA¿-ß."b /¯ßíWÝÆ´}w~,°iøÜCáÇþ@ÃZ5¤ïsÁ8½°ì* ZÜéjOÝK/Ë4§ÈG5×ä*¬6ÚwÇ0]ã:àÑþé¬G"ÅÁl/t° jlá»5¶&¯lìYìºØ'yDð½|#ý<ñìTé%¾ï¬ùƪx¶}«±o9»ë¼ÂÆÒï'w8Y?÷ðxsllû 6íqüGÞsÜóÀx·ªk®XºàåZ{íÁ½åo÷mbq¥ÝÃ8M

I tried other charsets and was considering the gzip encoding to be causing the problem but I am unsure how the test if that is the issue. Any suggestions or help would be greatly valued.

Below is my Code

<cfhttp 
    METHOD="get"
    throwonerror="yes" 
    CHARSET="ISO-8859-1"
    URL="http://www.cars.com/for-sale/searchresults.action?sf1Dir=DESC&prMn=1&crSrtFlds=stkTypId-feedSegId-pseudoPrice&rd=100000&zc=44203&PMmt=0-0-0&stkTypId=28881&sf2Dir=ASC&sf1Nm=price&sf2Nm=miles&feedSegId=28705&searchSource=UTILITY&pgId=2102&rpp=10">

    <cfhttpparam type="Header" name="Accept-Encoding" value="deflate;q=0">
    <cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >
</cfhttp>

<cfset listings = #cfhttp.FileContent#>
<cfoutput>
    #listings#
</cfoutput>

I have also tried the headers:

    <cfhttpparam type="Header" name="Accept-Encoding" value="*">
    <cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >

And tried removing the 'Accept-Encoding' header and just leaving the TE.

UPDATE: I still havn't figured it out, but I found something that might help someone help me out. When I used a test php server of mine to run file_get_contents on the same page and it worked fine, then if I ran the same cfhttp code to call the php page that was calling the page I need it worked just fine. Thanks for the suggestions so far.

like image 559
Patcouch22 Avatar asked Jan 22 '26 04:01

Patcouch22


1 Answers

The issue with cars.com seems to be that they're gzipping the output twice (based on this thread)

So, we need to unzip the content... again...

First, we need to get the content as binary, so the CFHTTP call needs to include

getasbinary="yes"

Then, we need to unzip it.

We can use java.util.zip to do it. The gunzip is a modified version of this cflib.org function:

<cfhttp
    getasbinary="yes"
    METHOD="get"
    throwonerror="yes"
    CHARSET="ISO-8859-1"
    URL="http://www.cars.com/for-sale/searchresults.action?sf1Dir=DESC&prMn=1&crSrtFlds=stkTypId-feedSegId-pseudoPrice&rd=100000&zc=44203&PMmt=0-0-0&stkTypId=28881&sf2Dir=ASC&sf1Nm=price&sf2Nm=miles&feedSegId=28705&searchSource=UTILITY&pgId=2102&rpp=10" >

    <cfhttpparam type="Header" name="Accept" value="application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5">
    <cfhttpparam type="Header" name="User-Agent" value="Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41">
    <cfhttpparam type="Header" name="Accept-Encoding" value="deflate">
    <cfhttpparam type="Header" name="TE" value="deflate, chunked, identity, trailers">

</cfhttp>

<cfset unzippedHTML = gunzip(cfhttp.FileContent)>

<cfoutput>
    #unzippedHTML#
</cfoutput>

<cfscript>

    function gunzip(inBytes) {
        var gzInStream = createObject('java','java.util.zip.GZIPInputStream');
        var outStream = createObject('java','java.io.ByteArrayOutputStream');
        var inStream = createObject('java','java.io.ByteArrayInputStream');
        var buffer = repeatString(" ",1024).getBytes();
        var length = 0;
        var rv = "";

        try {
            inStream.init(inBytes);
            gzInStream.init(inStream);
            outStream.init();
            do {
                length = gzInStream.read(buffer,0,1024);
                if (length neq -1) outStream.write(buffer,0,length);
            } while (length neq -1);
            rv = outStream.toString();
            outStream.close();
            gzInStream.close();
            inStream.close();
        }
        catch (any e) {
            rv = "";
            try {
                outStream.close();
            } catch (any e) { }
                try {
                    gzInStream.close();
                } catch (any e) {
                    try {
                        inStream.close();
                    } catch (any e) {}
                }
        }
        return rv;
    }
</cfscript>

Be sure to double-check the var scoping of the function. I might have missed something.

like image 66
Edward M Smith Avatar answered Jan 25 '26 13:01

Edward M Smith



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!