I am using XMLFormat() to encode some text for an XML document. However, when I go to read the XML file I created I get an invalid character error. Why does XMLFormat() not properly encode all characters?
I'm running CF8.
Are you sure to output the file in the right encoding? You can't just do
<cffile action="write" file="foo.xml" output="#xml#" />
as the result very likely diverges from the character set your XML is in. Unless otherwise noted (by an encoding declaration), XML files are treated as UTF-8, and you should do:
<cffile action="write" file="foo.xml" output="#xml#" charset="utf-8" />
<!--- and --->
<cffile action="read" file="foo.xml" variable="xml" charset="utf-8" />
I feel that this is a bug in XMLFormat. I am not sure who the original author of the snippet below is but here is an approach to catch the extra characters via regex...
<cfset myText = xmlFormat(myText)>
<cfscript>
i = 0;
tmp = '';
while(ReFind('[^\x00-\x7F]',myText,i,false))
{
i = ReFind('[^\x00-\x7F]',myText,i,false); // discover high chr and save it's numeric string position.
tmp = '&##x#FormatBaseN(Asc(Mid(myText,i,1)),16)#;'; // obtain the high chr and convert it to a hex numeric chr.
myText = Insert(tmp,myText,i); // insert the new hex numeric chr into the string.
myText = RemoveChars(myText,i,1); // delete the redundant high chr from string.
i = i+Len(tmp); // adjust the loop scan for the new chr placement, then continue the loop.
}
return myText;
</cfscript>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With