I have the following code:
def func(value, start=None, end=None):
if start is not None and start != 0:
start = -start
elif start == 0:
start = None
if end is not None:
end = -end - 1
return int('{:032b}'.format(value)[end:start], 2)
data = np.random.randint(1, 429496729, 10000)
starts = [10, 50, 100, 200]
stops = [30, 90, 170, 250]
data_dict = [{} for _ in range(len(starts))]
for ii, (start, stop) in enumerate(zip(starts, stops)):
range_array = np.arange(start, stop, 2)
data_dict[ii]['one'] = [func(value, 0, 8) for value in data[range_array]]
data_dict[ii]['two'] = [func(value, 9, 17) for value in data[range_array]]
data_dict[ii]['three'] = [func(value, 27, 27) for value in data[range_array]]
data_dict[ii]['four'] = [func(value, 28, 28) for value in data[range_array]]
The problem is that this code runs through relatively slowly. However, all other approaches I have tried so far are even slower. Does anyone have an idea how to rewrite this code so that it runs through faster?
You can use numpy
broadcasting to vectorize the bitmasking with logical and &
and shifting >>
.
import numpy as np
np.random.seed(100)
data = np.random.randint(1, 429496729, 10000)
starts = [10, 50, 100, 200]
stops = [30, 90, 170, 250]
# equal to 'start' from calling func(value, start, end)
shift = np.array([0,9,27,28])[:, None]
# equal to 'end - start + 1' from calling func(value, start, end)
bitmask = np.array([9,9,1,1])[:, None]
d = [data[start:stop:2] >> shift & (2**bitmask - 1) for start, stop in zip(starts, stops)]
To access the result list d
d[0]
Output
array([[ 54, 227, 291, 281, 229, 59, 508, 87, 365, 416],
[ 40, 207, 353, 168, 214, 271, 338, 268, 419, 52],
[ 1, 0, 0, 0, 0, 0, 1, 1, 0, 0],
[ 0, 1, 1, 1, 0, 0, 0, 1, 1, 0]])
And access similar to your dictionarys
one, two, three, four = np.arange(4)
d[1][two]
Output
array([ 68, 479, 230, 295, 278, 455, 276, 45, 360, 488, 241, 336, 447,
316, 181, 94, 138, 404, 223, 310])
To get the result exactly like the original solution:
actual = [
{
name: x[index].tolist()
for index, name
in enumerate(["one","two","three","four"])
}
for x in d
]
This produces the exact result and maintains an order of magnitude boost in performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With