Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any algorithm better than O(N) to find the first bit set in a bitarray that consists mostly of continous blocks?

Suppose that we have a bitarray and we know that array has very low complexity: it consists mostly of continuous chunks of "1"s or "0"s. That is, if we read two consecutive bits, the chance of them being identical is much higher than 50%. For example:

00001111100011100000111111100000

This is a typical array on this format, since it consists mostly of continous chunks of the same bit. So is this one:

00000000000000000000011111111000

But this is not:

10100110001010011100110001000111

That concept could be formalized in more precise ways, such as "iterating through the array we see at most log(N) bit changes", or "the chance of 2 consecutive bits being identical is >95%", but I'm not sure this matters.

My question is: what is a fast algorithm to find the index of the first bit set on the bit array (that is, 1)?

like image 864
MaiaVictor Avatar asked Dec 03 '25 16:12

MaiaVictor


2 Answers

Depending on your system, the Count Leading Zeros operation clz could make this go quite a bit faster. This is common in ARM architectures. With this operation, you can count many unset bits at once. That with some shifting (basically free in ARM) could speed things up quite a bit. Count 0s, not bits, shift away known bits, repeat.

Still linear-esque, but a much faster linear for sure - especially if the data has a lot of runs in it as you suggest.

like image 51
Michael Dorgan Avatar answered Dec 06 '25 09:12

Michael Dorgan


You haven't defined your input model unambiguously but let's say that it follows a Markov chain. That means that the first bit is whatever it is, and then when you move from one bit to the next, the probability of bits switching values is p and the probability of remaining the same is (1-p). If you know p, and p is small, you can make some progress on this problem. First check the first bit. If it is 1, then you are done. Otherwise you are looking for the first transition from 0 to 1. Note that if you consider k consecutive bits at a time, then the probability of getting exactly one transition in these k bits is kp(1-p)^(k-1). On the other hand, the probability of getting 2 or more transitions is essentially the same as getting exactly 2 transitions if p is small relative to k, and this probability is k*(k-1)p^2(1-p)^(k-2)/2. You want the first quantity to be not too small and the second quantity to be much smaller. This will happen for example if you choose k = round(1/sqrt(p)). Then if p is small the probability you get one transition over k consecutive bits is essentially sqrt(p), while the probability you get two transitions is essentially p/2, which is much smaller. So just check every kth bit until you see a 1, and then binary search between the previous and current positions to find the first occurrence of a 1. With high probability this will be the first occurrence of a 1. The expected running time is O(sqrt(p)*n + log(k)) assuming the Markov chain model, and the probability of error is O(sqrt(p)). So if p is small and you know p, this is a good compromise (of course, if you accept the model).

like image 25
user2566092 Avatar answered Dec 06 '25 10:12

user2566092



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!