Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Coldfusion XMLFormat() not converting all characters

I am using XMLFormat() to encode some text for an XML document. However, when I go to read the XML file I created I get an invalid character error. Why does XMLFormat() not properly encode all characters?

I'm running CF8.

like image 533
Jason Avatar asked Dec 01 '25 09:12

Jason


2 Answers

Are you sure to output the file in the right encoding? You can't just do

<cffile action="write" file="foo.xml" output="#xml#" />

as the result very likely diverges from the character set your XML is in. Unless otherwise noted (by an encoding declaration), XML files are treated as UTF-8, and you should do:

<cffile action="write" file="foo.xml" output="#xml#" charset="utf-8" />
<!--- and --->
<cffile action="read" file="foo.xml" variable="xml" charset="utf-8" />
like image 105
Tomalak Avatar answered Dec 03 '25 23:12

Tomalak


I feel that this is a bug in XMLFormat. I am not sure who the original author of the snippet below is but here is an approach to catch the extra characters via regex...

  <cfset myText = xmlFormat(myText)>

  <cfscript>
      i = 0;
      tmp = '';
      while(ReFind('[^\x00-\x7F]',myText,i,false))
      {
        i = ReFind('[^\x00-\x7F]',myText,i,false); // discover high chr and save it's numeric string position.
        tmp = '&##x#FormatBaseN(Asc(Mid(myText,i,1)),16)#;'; // obtain the high chr and convert it to a hex numeric chr.
        myText = Insert(tmp,myText,i); // insert the new hex numeric chr into the string.
        myText = RemoveChars(myText,i,1); // delete the redundant high chr from string.
        i = i+Len(tmp); // adjust the loop scan for the new chr placement, then continue the loop.
      }
      return myText;
  </cfscript>
like image 22
kevink Avatar answered Dec 03 '25 23:12

kevink



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!