There's several shell-specific ways to include a ‘unicode literal’ in a string. For instance, in Bash, the quoted string-expanding mechanism, $'', allows us to directly embed an invisible character: $'\u2620'.
However, if you're trying to write universally cross-platform shell-scripts (generally, this can be truncated to “runs in Bash, Zsh, and Dash.”), that's not a portable feature.
I can portably achieve anything in the ASCII table (octal number-space) with a construct like the following:
WHAT_A_CHARACTER="$(printf '\036')"
… however, POSIX / Dash printf only supports octal escapes.
I can also obviously achieve the full Unicode space by farming the task out to a fuller programming environment:
OH_CAPTAIN_MY_CAPTAIN="$(ruby -e 'print "\u2388"')"
TAKE_ME_OUT_TONIGHT="$(node -e 'console.log("\u266C")')"
So: what's the best way to encode such a character into a shell-script, that:
dash, bash, and zsh,If you have Gnu printf installed (it's in debian package coreutils, for example), then you can use it independent of which shell you are using by avoiding the shell's builtin:
env printf '\u2388\n'
Here I am using the Posix-standard env command to avoid the use of the printf builtin, but if you happen to know where printf is you could do this directly by using the complete, path, such as
/usr/bin/printf '\u2388\n'
If both your external printf and your shell's builtin printf only implement the Posix standard, you need to work harder. One possibility is to use iconv to translate to UTF-8, but while the Posix standard requires that there be an iconv command, it does not in any way prescribe the way standard encodings are named. I think the following will work on most Posix-compatible platforms, but the number of subshells created might be sufficient to make it less efficient than a "heavy" script interpreter:
printf $(printf '\\%o' $(printf %08x 0x2388 | sed 's/../0x& /g')) |
iconv -f UTF-32BE -t UTF-8
The above uses the printf builtin to force the hexadecimal codepoint value to be 8 hex digits long, then sed to rewrite them as 4 hex constants, then printf again to change the hex constants into octal notation and finally another printf to interpret the octal character constants into a four-byte sequence which can be fed into iconv as big-endian UTF-32. (It would be simpler with a printf which recognizes \x escape codes, but Posix doesn't require that and dash doesn't implement it.)
You can use the line without modification to print more than one symbol, as long as you provide the Unicode codepoints (as integer constants) for all of them (example executed in dash):
$ printf $(printf '\\%o' $(printf %08x 0x2388 0x266c 0xA |
> sed 's/../0x& /g')) |
> iconv -f UTF-32BE -t UTF-8
⎈♬
$
Note: As Geoff Nixon mentions in a comment, the fish shell (which is nowhere close to Posix standard, and as far as I can see has no aspirations to conform) will complain about the unquoted %08x format argument to printf, because it expects words starting with % to be jobspecs. So if you use fish, add quotes to the format argument.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With