I am trying an XQuery using fn:matches with a regular expression, but the MarkLogic implementation of XQuery does not seem to allow hexidecimal character representations. The following gives me an "Invalid regular expression" error.
(: Find text containing non-ISO-Latin characters :)
let $regex := '[^\x00-\xFF]'
let $results := fn:collection('mydocs')//myns:myelem[fn:matches(., $regex)]
let $count := fn:count($results)
return
    <figures count="{$count}">
        { $results }
    </figures>
However, this one does not give the error.
let $regex := '[^a-zA-Z0-9]'
let $results := fn:collection('mydocs')//myns:myelem[fn:matches(., $regex)]
let $count := fn:count($results)
return
    <figures count="{$count}">
        { $results }
    </figures>
Is there a way to use the hexidecimal character representation, or an alternative that would give me the same result, in MarkLogic's implementation of XQuery?
XQuery can use numeric character references in strings, in much the same way that XML and HTML can:
decimal: "
"
hex: "�a;" (or just "&#a;")
However, you can't represent some characters: <= "	", for instance.
There's no regex type in XQuery (you just use a string as a regex), so you can use character references in your regular expressions:
fn:matches("a", "[^	-ÿ]")
(: => xs:boolean("false") :)
Update: here's the XQuery 1.0 spec on character references: http://www.w3.org/TR/xquery/#dt-character-reference.
Based on some brief testing, I think MarkLogic enforces XML 1.1 character reference rules: http://www.w3.org/TR/xml11/#charsets
For posterity, here are the XML 1.0 rules: http://www.w3.org/TR/REC-xml/#charsets
Well, it seems MarkLogic's implementation of xQuery wants Unicode. As it turned out, even very small ranges in hex(e.g., [^x00-x0F]) threw the "Invalid regular expression" error, but Unicode notation did not throw the error. The following give me results.
let $regex := '[^U0000-U00FF]'
let $results := fn:collection('mydocs')//myns:myelem[fn:matches(., $regex)]
let $count := fn:count($results)
return
    <figures count="{$count}">
        { $results }
    </figures>
I think that the mere assignment of let $regex := '[^\x00-\xFF]' did not throw the error because it was treated as a string when I tried return $regex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With