Merging records in a Numpy structured array

Question

I have a Numpy structured array that is sorted by the first column:

x = array([(2, 3), (2, 8), (4, 1)], dtype=[('recod', '<u8'), ('count', '<u4')])

I need to merge records (sum the values of the second column) where

x[n][0] == x[n + 1][0]

In this case, the desired output would be:

x = array([(2, 11), (4, 1)], dtype=[('recod', '<u8'), ('count', '<u4')])

What's the best way to achieve this?

Divakar · Accepted Answer

You can use np.unique to get an ID array for each element in the first column and then use np.bincount to perform accumulation on the second column elements based on the IDs -

In [140]: A
Out[140]: 
array([[25,  1],
       [37,  3],
       [37,  2],
       [47,  1],
       [59,  2]])

In [141]: unqA,idx = np.unique(A[:,0],return_inverse=True)

In [142]: np.column_stack((unqA,np.bincount(idx,A[:,1])))
Out[142]: 
array([[ 25.,   1.],
       [ 37.,   5.],
       [ 47.,   1.],
       [ 59.,   2.]])

You can avoid np.unique with a combination of np.diff and np.cumsum which might help because np.unique also does sorting internally, which is not needed in this case as the input data is already sorted. The implementation would look something like this -

In [201]: A
Out[201]: 
array([[25,  1],
       [37,  3],
       [37,  2],
       [47,  1],
       [59,  2]])

In [202]: unq1 = np.append(True,np.diff(A[:,0])!=0)

In [203]: np.column_stack((A[:,0][unq1],np.bincount(unq1.cumsum()-1,A[:,1])))
Out[203]: 
array([[ 25.,   1.],
       [ 37.,   5.],
       [ 47.,   1.],
       [ 59.,   2.]])

hpaulj · Answer

Dicakar's answer cast in structured array form:

In [500]: x=np.array([(25, 1), (37, 3), (37, 2), (47, 1), (59, 2)], dtype=[('recod', '<u8'), ('count', '<u4')])

Find unique values and count duplicates:

In [501]: unqA, idx=np.unique(x['recod'], return_inverse=True)    
In [502]: cnt = np.bincount(idx, x['count'])

Make a new structured array and fill the fields:

In [503]: x1 = np.empty(unqA.shape, dtype=x.dtype)
In [504]: x1['recod'] = unqA
In [505]: x1['count'] = cnt

In [506]: x1
Out[506]: 
array([(25, 1), (37, 5), (47, 1), (59, 2)], 
      dtype=[('recod', '<u8'), ('count', '<u4')])

There is a recarray function that builds an array from a list of arrays:

In [507]: np.rec.fromarrays([unqA,cnt],dtype=x.dtype)
Out[507]: 
rec.array([(25, 1), (37, 5), (47, 1), (59, 2)], 
      dtype=[('recod', '<u8'), ('count', '<u4')])

Internally it does the same thing - build an empty array of the right size and dtype, and then loop over over the dtype fields. A recarray is just a structured array in a specialized array subclass wrapper.

There are two ways of populating a structured array (especially with a diverse dtype) - with a list of tuples as you did with x, and field by field.

Merging records in a Numpy structured array

Tags:

python

arrays

numpy

krlk89

2 Answers

Divakar

hpaulj

Recent Activity

Donate For Us

Merging records in a Numpy structured array

Tags:

python

arrays

numpy

krlk89

2 Answers

Divakar

hpaulj

Related questions

Recent Activity

Donate For Us