Python: Combine array rows based on difference between prior row's last element and posterior row's first element

Question

As title, say I am given a (n, 2) numpy array recording a series of segment's start and end indices, for example n=6:

import numpy as np
# x records the (start, end) index pairs corresponding to six segments
x = np.array(([0,4],    # the 1st seg ranges from index 0 ~ 4
              [5,9],    # the 2nd seg ranges from index 5 ~ 9, etc.
              [10,13],
              [15,20],
              [23,30],
              [31,40]))

Now I want to combine those segments with small interval between them. For example, merge consecutive segments if the interval is no larger than 1, so desired output would be:

y = np.array([0,13],    # Cuz the 1st seg's end is close to 2nd's start, 
                        # and 2nd seg's end is close to 3rd's start, so are combined.
             [15,20],   # The 4th seg is away from the prior and posterior segs,
                        # so it remains untouched.
             [23,40])   # The 5th and 6th segs are close, so are combined

so that the output segments would turn out to be just three instead of six. Any suggestion would be appreciated!

EFT · Accepted Answer

If we're able to assume the segments are ordered and none are wholly contained within a neighbor, then you could do this by identifying where the gap between the end value in one range and the start of the next exceeds your criteria:

start = x[1:, 0]  # select columns, ignoring the beginning of the first range
end = x[:-1, 1]  # and the end of the final range
mask = start>end+1  # identify where consecutive rows have too great a gap

Then stitching these pieces back together:

np.array([np.insert(start[mask], 0, x[0, 0]), np.append(end[mask], x[-1, -1])]).T
Out[96]: 
array([[ 0, 13],
       [15, 20],
       [23, 40]])

Divakar · Answer

Here's a NumPy vectorized solution -

def merge_boundaries(x):
    mask = (x[1:,0] - x[:-1,1])!=1
    idx = np.flatnonzero(mask)
    start = np.r_[0,idx+1]
    stop = np.r_[idx, x.shape[0]-1]
    return np.c_[x[start,0], x[stop,1]]

Sample run -

In [230]: x
Out[230]: 
array([[ 0,  4],
       [ 5,  9],
       [10, 13],
       [15, 20],
       [23, 30],
       [31, 40]])

In [231]: merge_boundaries(x)
Out[231]: 
array([[ 0, 13],
       [15, 20],
       [23, 40]])

Python: Combine array rows based on difference between prior row's last element and posterior row's first element

Tags:

python

arrays

numpy

Francis

2 Answers

EFT

Divakar

Recent Activity

Donate For Us

Python: Combine array rows based on difference between prior row's last element and posterior row's first element

Tags:

python

arrays

numpy

Francis

2 Answers

EFT

Divakar

Related questions

Recent Activity

Donate For Us