Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert values in column from hex to binary in pandas data frame

I have one column in pandas data frame with hex values, for example:

Data
1A
2B
BB
FF
A7
78
CB

I want to convert hex values in binary, then from binary to take first 3 bits and finally convert 3 bits value in decimal.

Data column in binary will be:

Data
00011010
00101011
10111011
11111111
10100111
01111000
11001011

the first 3 bits:

Data
010
011
011
111
111
000
011

and finally the desired value in decimal:

Data
2
3
3
7
7
0
3

How to do this? I tried with bin() function, but it doesn't work with pandas data frames.

like image 701
slobokv83 Avatar asked Nov 18 '25 02:11

slobokv83


2 Answers

We can do this by a chain of actions:

  1. first we convert the hexadecimal number to an int with .apply(int, base=16);
  2. next we convert this to binary data, with .apply(bin);
  3. next we chunk off the first two characters with .str[2:];
  4. then we obtain the last three characters with .str[-3:]; and
  5. finally we again interpret these as ints, with .apply(int, base=2).

So:

>>> df.Data.apply(int, base=16).apply(bin).str[2:].str[-3:].apply(int, base=2)
0    2
1    3
2    3
3    7
4    7
5    0
6    3
Name: Data, dtype: int64

We can however use another strategy here:

  1. we first convert the hexadecimal number to an int; and
  2. then we apply a bitwise and with 0b111.

for example:

>>> df.Data.apply(int, base=16) & 0b111
0    2
1    3
2    3
3    7
4    7
5    0
6    3
Name: Data, dtype: int64

The second attempt is not only simpler, but faster as well, approximately by 66%:

>>> timeit(first_strategy, number=10000)
6.962630775000434
>>> timeit(second_strategy, number=10000)
2.330652763019316

for a dataframe that repeats the sample data 100 times, we get:

>>> timeit(first_strategy, number=10000)
17.603060900000855
>>> timeit(second_strategy, number=10000)
5.901462858979357

this is again 66% faster.

like image 181
Willem Van Onsem Avatar answered Nov 20 '25 16:11

Willem Van Onsem


You can use:

df.Data.apply(lambda v: int(format(int(v, 16), '08b')[-3:], 2))

Which gives you:

0    2
1    3
2    3
3    7
4    7
5    0
6    3
Name: Data, dtype: int64

Those steps are:

  • Take your original data and convert it to decimal using int(number, 16) (base 16 is hex) (int('1A', 16) == 26)
  • Take that number and format it as a binary string format(number, '08b') gives you an character string of 0/1's zero filled on the left (format(26, '08b') == '00011010')
  • Take the last 3 characters of that string [-3:] ('010') and convert it to decimal with a base 2, int(binary_string[-3:], 2) gives you: 2
like image 21
Jon Clements Avatar answered Nov 20 '25 17:11

Jon Clements



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!