Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decoding a bytes sequence - what's the train of thought when doing it

I have this sequence and I have to decode it, as a complete beginner in Python and in encoding.

enc = b'\x80\x03}q\x00(K\x01K\x01K\x02K\x03K\x03K\x06K\x04G?\xc5UUUUUUK\x05G?\xe0\x00\x00\x00\x00\x00\x00K\x06G?\x9cq\xc7\x1cq\xc7\x1cK\x07G?\xc5UUUUUUK\x08K$K\tG?\xb5UUUUUUK\nK\x07K\x0bG?\xe5UUUUUUK\x0cG?\xb5UUUUUUK\rG?\xedUUUUUUK\x0eK4K\x0fG?\xb3\xb1;\x13\xb1;\x14K\x10K\x00K\x11G?\xcd\x89\xd8\x9d\x89\xd8\x9eK\x12G?\xcb\x9b\x9b\x9b\x9b\x9b\x9cK\x13G?\xa4\x14\x14\x14\x14\x14\x14K\x14X\x08\x00\x00\x00discretaq\x01K\x15K\x02K\x16X\x02\x00\x00\x00daq\x02K\x17G?\xe4z\xe1G\xae\x14{K\x18G@\x15\x00\x00\x00\x00\x00\x00K\x19G?\xe4z\xe1G\xae\x14|K\x1aK2K\x1bK\x01K\x1cK\x03K\x1dG?\xd5UUUUUUK\x1eG?\xc5UUUUUUK\x1fK\x01K K\x04K!G?\xaf\xf2\xe4\x8e\x8aq\xdeK"K\x04K#X\x04\x00\x00\x00mareq\x03u.'

I tried doing it this way

strputere = enc.decode()

print(strputere)

and I get an error

File "encode.py", line 4, in <module>
    strputere = enc.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

I started doing a bit of research, and I found that b stands for bytes.

So my enc variable is a bytes string literal. I've looked into .decode() and it seemed like it was a good choice - but it might be not.

I'm a bit confused because it is a bytes string literal, but it contains some characters (such as \x80) that I think they are UTF-8 characters.

So, how can I decode this, and what would be the algorithm for that? I would love to understand what happens, I did my research but I'm a bit lost, I'd need some help.


1 Answers

So, generally when you have a byte sequence you have two different ways to approach it, depending on the contents:

  1. Is it a pure string sequence?

If dealing with a pure string sequence, you need to decode using the following:

enc.decode("utf-8") 

Keep in mind that in this case, you must know what encoding was used (here utf-8). But it appears that it might be incorrect according to the error message you got. S

If you don't know the encoding but you know its definitely a string-encoding, you can take a look at the options mentioned in this question here

  1. Sensor/Other input

If you are using an embedded device, or any bytes input that might contain a series of data, and not just one field, you must use struct.unpack(). This is a bit more complicated, and you will need to go through the docs to find the exact string you must use to decode.

The way it works is that you tell python what each bytes are (string, int, etc) and how long each one is, and it will convert it into a tuple of objects as follows:

values = list(struct.unpack('>BBHBBhBHhHL', enc))
like image 142
Zaid Al Shattle Avatar answered Jan 25 '26 20:01

Zaid Al Shattle



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!