Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 latin-1 conversion issues, python django

ok so my issue is i have the string '\222\222\223\225' which is stored as latin-1 in the db. What I get from django (by printing it) is the following string, 'ââââ¢' which I assume is the UTF conversion of it. Now I need to pass the string into a function that does this operation:

strdecryptedPassword + chr(ord(c) - 3 - intCounter - 30)

I get this error:

chr() arg not in range(256)

If I try to encode the string as latin-1 first I get this error:

'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)

I have read a bunch on how character encoding works, and there is something I am missing because I just don't get it!

like image 747
jacob Avatar asked Sep 04 '25 16:09

jacob


2 Answers

Your first error 'chr() arg not in range(256)' probably means you have underflowed the value, because chr cannot take negative numbers. I don't know what the encryption algorithm is supposed to do when the inputcounter + 33 is more than the actual character representation, you'll have to check what to do in that case.

About the second error. you must decode() and not encode() a regular string object to get a proper representation of your data. encode() takes a unicode object (those starting with u') and generates a regular string to be output or written to a file. decode() takes a string object and generate a unicode object with the corresponding code points. This is done with the unicode() call when generated from a string object, you could also call a.decode('latin-1') instead.

>>> a = '\222\222\223\225'
>>> u = unicode(a,'latin-1')
>>> u
u'\x92\x92\x93\x95'
>>> print u.encode('utf-8')
ÂÂÂÂ
>>> print u.encode('utf-16')
ÿþ
>>> print u.encode('latin-1')

>>> for c in u:
...   print chr(ord(c) - 3 - 0 -30)
...
q
q
r
t
>>> for c in u:
...   print chr(ord(c) - 3 -200 -30)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ValueError: chr() arg not in range(256)
like image 176
Vinko Vrsalovic Avatar answered Sep 07 '25 17:09

Vinko Vrsalovic


As Vinko notes, Latin-1 or ISO 8859-1 doesn't have printable characters for the octal string you quote. According to my notes for 8859-1, "C1 Controls (0x80 - 0x9F) are from ISO/IEC 6429:1992. It does not define names for 80, 81, or 99". The code point names are as Vinko lists them:

\222 = 0x92 => PRIVATE USE TWO
\223 = 0x93 => SET TRANSMIT STATE
\225 = 0x95 => MESSAGE WAITING

The correct UTF-8 encoding of those is (Unicode, binary, hex):

U+0092 = %11000010 %10010010 = 0xC2 0x92
U+0093 = %11000010 %10010011 = 0xC2 0x93
U+0095 = %11000010 %10010101 = 0xC2 0x95

The LATIN SMALL LETTER A WITH CIRCUMFLEX is ISO 8859-1 code 0xE2 and hence Unicode U+00E2; in UTF-8, that is %11000011 %10100010 or 0xC3 0xA2.

The CENT SIGN is ISO 8859-1 code 0xA2 and hence Unicode U+00A2; in UTF-8, that is %11000011 %10000010 or 0xC3 0x82.

So, whatever else you are seeing, you do not seem to be seeing a UTF-8 encoding of ISO 8859-1. All else apart, you are seeing but 5 bytes where you would have to see 8.

Added: The previous part of the answer addresses the 'UTF-8 encoding' claim, but ignores the rest of the question, which says:

Now I need to pass the string into a function that does this operation:

    strdecryptedPassword + chr(ord(c) - 3 - intCounter - 30)

I get this error: chr() arg not in range(256).  If I try to encode the
string as Latin-1 first I get this error: 'latin-1' codec can't encode
characters in position 0-3: ordinal not in range(256).

You don't actually show us how intCounter is defined, but if it increments gently per character, sooner or later 'ord(c) - 3 - intCounter - 30' is going to be negative (and, by the way, why not combine the constants and use 'ord(c) - intCounter - 33'?), at which point, chr() is likely to complain. You would need to add 256 if the value is negative, or use a modulus operation to ensure you have a positive value between 0 and 255 to pass to chr(). Since we can't see how intCounter is incremented, we can't tell if it cycles from 0 to 255 or whether it increases monotonically. If the latter, then you need an expression such as:

chr(mod(ord(c) - mod(intCounter, 255) + 479, 255))

where 256 - 33 = 223, of course, and 479 = 256 + 223. This guarantees that the value passed to chr() is positive and in the range 0..255 for any input character c and any value of intCounter (and, because the mod() function never gets a negative argument, it also works regardless of how mod() behaves when its arguments are negative).

like image 42
Jonathan Leffler Avatar answered Sep 07 '25 17:09

Jonathan Leffler