I have an array of shape (n,t) which I'd like to treat as a timeseries of n-vectors.
I'd like to know the unique n-vector values that exist along the t-dimension as well as the associated t-indices for each unique vector. I'm happy to use any reasonable definition of equality (e.g. numpy.unique will take floats)
This is easy with a Python loop over t but I'm hoping for a vectorized approach.
In some special cases it can be done by collapsing the n-vectors into scalars (and using numpy.unique on the 1d result), e.g. if you had booleans you could use a vectorized dot with the (2**k) vector to convert (boolean vectors) to integers, but I'm looking for a fairly general solution.
If the shape of your array was (t, n)--so the data for each n-vector was contiguous in memory--you could create a view of the 2-d array as a 1-d structured array, and then use numpy.unique on this view.
If you can change the storage convention of your array, or if you don't mind making a copy of the transposed array, this could work for you.
Here's an example:
import numpy as np
# Demo data.
x = np.array([[1,2,3],
[2,0,0],
[1,2,3],
[3,2,2],
[2,0,0],
[2,1,2],
[3,2,1],
[2,0,0]])
# View each row as a structure, with field names 'a', 'b' and 'c'.
dt = np.dtype([('a', x.dtype), ('b', x.dtype), ('c', x.dtype)])
y = x.view(dtype=dt).squeeze()
# Now np.unique can be used. See the `unique` docstring for
# a description of the options. You might not need `idx` or `inv`.
u, idx, inv = np.unique(y, return_index=True, return_inverse=True)
print("Unique vectors")
print(u)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With