I have a dataframe of zeros and ones. I want to treat each column as if its values were a binary representation of an integer. What is easiest way to make this conversion?
I want this:
df = pd.DataFrame([[1, 0, 1], [1, 1, 0], [0, 1, 1], [0, 0, 1]])
print df
0 1 2
0 1 0 1
1 1 1 0
2 0 1 1
3 0 0 1
converted to:
0 12
1 6
2 11
dtype: int64
As efficiently as possible.
Similar solution, but more faster:
print (df.T.dot(1 << np.arange(df.shape[0] - 1, -1, -1)))
0 12
1 6
2 11
dtype: int64
Timings:
In [81]: %timeit df.apply(lambda col: int(''.join(str(v) for v in col), 2))
The slowest run took 5.66 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 264 µs per loop
In [82]: %timeit (df.T*(1 << np.arange(df.shape[0]-1, -1, -1))).sum(axis=1)
1000 loops, best of 3: 492 µs per loop
In [83]: %timeit (df.T.dot(1 << np.arange(df.shape[0] - 1, -1, -1)))
The slowest run took 6.14 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 204 µs per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With