Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'unraw' a string in JavaScript

How can I take a raw string in JavaScript and convert all the escape sequences to their respective characters? In other words, the reverse of String.raw. For example:

unraw("\\x61\\x62\\x63 \\u{1F4A9} \\u0041");
// => "abc 💩 A";

I tried JSON.parse, however it only supports the last format (\\u0041). Neither unescape nor decodeURI are what I am looking for at all.

like image 329
Ian Avatar asked Oct 22 '25 14:10

Ian


1 Answers

I think you basically have three choices:

  1. Write your own function to do it, handling the various types of escapes that JavaScript allows in strings; or
  2. Leverage the JavaScript parser built into the JavaScript engine where this code is running, which means trusting the content of the string since you have to use new Function (or even eval) to do it, which means opening yourself up to arbitrary code execution; or
  3. Use a parser like Esprima or similar

#1 is a bit of a pain but really not that bad, there aren't that many to handle. #2 has all the usual issues around trusting the string contents not to be nefarious code, since using eval or calling the function new Function creates allows arbitrary code execution. #3 is a fairly heavy solution.

Looking at #1 a bit more closely, EscapeSequence breaks down into:

  • Single character escapes, \ followed by one of '"\bfnrtv.
  • Hex escapes, \xHH where H is a hex digit
  • Unicode escapes, \uHHHH or \u{H+) where, again, H is a hex digit

That's not actually all that bad. Here's a quick-and-dirty:

// Note: This does not implement LegacyOctalEscapeSequence (https://tc39.es/ecma262/#prod-annexB-LegacyOctalEscapeSequence)
function unraw(str) {
    return str.replace(/\\[0-9]|\\['"\bfnrtv]|\\x[0-9a-f]{2}|\\u[0-9a-f]{4}|\\u\{[0-9a-f]+\}|\\./ig, match => {
        switch (match[1]) {
            case "'":
            case "\"":
            case "\\":
                return match[1];
            case "b":
                return "\b";
            case "f":
                return "\f";
            case "n":
                return "\n";
            case "r":
                return "\r";
            case "t":
                return "\t";
            case "v":
                return "\v";
            case "u":
                if (match[2] === "{") {
                    return String.fromCodePoint(parseInt(match.substring(3), 16));
                }
                return String.fromCharCode(parseInt(match.substring(2), 16));
            case "x":
                return String.fromCharCode(parseInt(match.substring(2), 16));
            case "0":
                return "\0";
            default: // E.g., "\q" === "q"
                return match.substring(1);
        }
    });
}
console.log(String.raw`${unraw("\\x61\\x62\\x63 \\u{1F4A9} \\u0041")}`);
// Double-check result
const str =           "\x61\x62\x63 \u{1F4A9} \u0041";
const raw = String.raw`\x61\x62\x63 \u{1F4A9} \u0041`;
console.log(str === unraw(raw));

I'm sure that can be cleaned up a bit.

like image 166
T.J. Crowder Avatar answered Oct 25 '25 03:10

T.J. Crowder