So I have a rather large (200k+ rows) structured array:
recordtype = np.dtype([('x',np.float32),('y',np.float32),('z',np.float32), \
('u',np.float32),('v',np.float32),('w',np.float32), \
('d',np.float32),('T',np.float32),('mdot',np.float32), \
('f',np.float32),('t',np.float32),('name',np.str_,14)])
data = np.loadtxt('tmp2.out',dtype=recordtype,skiprows=2)
In the 'name' columns, there are non-unique elements: len(data[:]['name']) is larger than len(set(data[:]['name'])). I would like to create a new array with only unique elements from name, I guess first occurrence is fine. How would I do this efficiently?
to get unique indices you can use np.unique
unique_elements, indices = np.unique(data[:]['name'], return_index = True)
then you know the unique indices in the name dimension that you need to access. Then you should be able to do just select those indices
data = data[indices]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With