Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript encoding breaking & combining multibyte characters?

I'm planning to use a client-side AES encryption for my web-app.

Right now, I've been looking for ways to break multibyte characters into one byte-'non-characters' ,encrypt (to have the same encrypted text length),

de-crypt them back, convert those one-byte 'non-characters' back to multibyte characters.

I've seen the wiki for UTF-8 (the supposedly-default encoding for JS?) and UTF-16, but I can't figure out how to detect "fragmented" multibyte characters and how I can combine them back.

Thanks : )

like image 519
user1894397 Avatar asked Dec 05 '25 03:12

user1894397


2 Answers

JavaScript strings are UTF-16 stored in 16-bit "characters". For Unicode characters ("code points") that require more than 16 bits (some code points require 32 bits in UTF-16), each JavaScript "character" is actually only half of the code point.

So to "break" a JavaScript character into bytes, you just take the character code and split off the high byte and the low byte:

var code = str.charCodeAt(0); // The first character, obviously you'll have a loop
var lowbyte = code & 0xFF;
var highbyte = (code & 0xFF00) >> 8;

(Even though JavaScript's numbers are floating point, the bitwise operators work in terms of 32-bit integers, and of course in our case only 16 of those bits are relevant.)

You'll never have an odd number of bytes, because again this is UTF-16.

like image 107
T.J. Crowder Avatar answered Dec 06 '25 17:12

T.J. Crowder


You could simply convert to UTF8... For example by using this trick

function encode_utf8(s) {
  return unescape(encodeURIComponent(s));
}

function decode_utf8(s) {
  return decodeURIComponent(escape(s));
}

Considering you are using crypto-js, you can use its methods to convert to utf8 and return to string. See here:

var words = CryptoJS.enc.Utf8.parse('𤭢');
var utf8  = CryptoJS.enc.Utf8.stringify(words);

The 𤭢 is probably a botched example of Utf8 character.

By looking at the other examples (see the Latin1 example), I'll say that with parse you convert a string to Utf8 (technically you convert it to Utf8 and put in a special array used by crypto-js of type WordArray) and the result can be passed to the Aes encoding algorithm and with stringify you convert a WordArray (for example obtained by decoding algorithm) to an Utf8.

JsFiddle example: http://jsfiddle.net/UpJRm/

like image 36
xanatos Avatar answered Dec 06 '25 17:12

xanatos



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!