Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy vectorization of algorithm to return indices of contiguous sections in binary sequence

Given a binary sequence, say: 000110111100011, I wish to mark all sections of contiguous 1s.

000110111100011
0123456789abcdef

I'll use hex notation, as 16 is a good length for an example sequence.

This should return (3 5) (6 a) (d f) The first region occupies positions 3 and 4, therefore (using Python's convention), (3,5) denotes the region.

However occasionally a bit is corrupted, so I wish to add a tolerance so that it ignores up to k consecutive 0s.

So, setting min_gap=2 in the above example, I would like to get: (3 a) (d f).

i.e. the 5) (6 is removed, as that gap comprises just one zero, and is 'patched over'.

Seems simple, but I struggled to fully vectorize a solution in numpy.

def get_segments(segment_of_sums, mingap=1):
    nonempty_padded = np.concatenate(( [0], segment_of_sums, [0] ))
    b = nonempty_padded > 0
    edges = b[:-1] ^ b[1:]
    indices = np.argwhere(edges)[:,0]
    # indices[0] will always be the first on-ramp, and indices[-1] will be the last off-ramp
    # if segment ends with a 1, indices[-1] will be len(segment)
    # len(indices) will ALWAYS be an even number
 
    if mingap>1 and len(indices) > 2:
        gap_lengths = indices[2::2] - indices[1::2][:-1]
        gap_keeps = gap_lengths >= mingap

        index_keeps = np.zeros_like(indices, dtype=np.bool8)
        index_keeps[[0, -1]] = True
        for i, flag in enumerate(gap_keeps):
            if flag:
                index_keeps[[2*i+1, 2*i+2]] = True
        
        indices = indices[np.argwhere(index_keeps)[:,0]]

    return indices.reshape((-1,2))

TEST = True
if TEST:
    for L in [
        [0,1,0, 1,1,1, 0,0,1, 1,1,0],
        [0,1,0, 1,1,1, 0,0,1, 1,1,1],
        [1,1,0, 1,1,1, 0,0,1, 1,1,1]
    ]:
        A = np.array(L, dtype=np.bool8)
        print(
            get_segments(A, mingap=2)
        )

This code correctly prints:

[[ 1  6]
 [ 8 11]]
[[ 1  6]
 [ 8 12]]
[[ 0  6]
 [ 8 12]]

However the for loop feels clumsy.

Can anyone see a cleaner technique?

like image 649
P i Avatar asked Jan 21 '26 06:01

P i


1 Answers

Not sure you can easily vectorize this, but here is a solution using itertools.groupby that should be quite fast:

s = '000110111100011'

from itertools import groupby

[('%x' % (x:=list(g))[0][0], '%x' % (x[-1][0]+1))
 for k,g in groupby(enumerate(s), lambda x: x[1])
 if k == '1']

Output:

[('3', '5'), ('6', 'a'), ('d', 'f')]
like image 102
mozway Avatar answered Jan 22 '26 19:01

mozway



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!