Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove elements that appear more often than once from numpy array

The question is, how can I remove elements that appear more often than once in an array completely. Below you see an approach that is very slow when it comes to bigger arrays. Any idea of doing this the numpy-way? Thanks in advance.

import numpy as np

count = 0
result = []
input = np.array([[1,1], [1,1], [2,3], [4,5], [1,1]]) # array with points [x, y]

# count appearance of elements with same x and y coordinate
# append to result if element appears just once

for i in input:
    for j in input:
        if (j[0] == i [0]) and (j[1] == i[1]):
            count += 1
    if count == 1:
        result.append(i)
    count = 0

print np.array(result)

UPDATE: BECAUSE OF FORMER OVERSIMPLIFICATION

Again to be clear: How can I remove elements appearing more than once concerning a certain attribute from an array/list ?? Here: list with elements of length 6, if first and second entry of every elements both appears more than once in the list, remove all concerning elements from list. Hope I'm not to confusing. Eumiro helped me a lot on this, but I don't manage to flatten the output list as it should be :(

import numpy as np 
import collections

input = [[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]]

# here, from input there should be removed input[0], input[1] and input[4] because
# first and second entry appears more than once in the list, got it? :)

d = {}

for a in input:
    d.setdefault(tuple(a[:2]), []).append(a[2:])

outputDict = [list(k)+list(v) for k,v in d.iteritems() if len(v) == 1 ]

result = []

def flatten(x):
    if isinstance(x, collections.Iterable):
        return [a for i in x for a in flatten(i)]
    else:
        return [x]

# I took flatten(x) from http://stackoverflow.com/a/2158522/1132378
# And I need it, because output is a nested list :(

for i in outputDict:
    result.append(flatten(i))

print np.array(result)

So, this works, but it's impracticable with big lists. First I got RuntimeError: maximum recursion depth exceeded in cmp and after applying sys.setrecursionlimit(10000) I got Segmentation fault how could I implement Eumiros solution for big lists > 100000 elements?

like image 526
feinmann Avatar asked Dec 17 '25 15:12

feinmann


1 Answers

np.array(list(set(map(tuple, input))))

returns

array([[4, 5],
       [2, 3],
       [1, 1]])

UPDATE 1: If you want to remove the [1, 1] too (because it appears more than once), you can do:

from collections import Counter

np.array([k for k, v in Counter(map(tuple, input)).iteritems() if v == 1])

returns

array([[4, 5],
       [2, 3]])

UPDATE 2: with input=[[1,1,2], [1,1,3], [2,3,4], [4,5,5], [1,1,7]]:

input=[[1,1,2], [1,1,3], [2,3,4], [4,5,5], [1,1,7]]

d = {}
for a in input:
    d.setdefault(tuple(a[:2]), []).append(a[2])

d is now:

{(1, 1): [2, 3, 7],
 (2, 3): [4],
 (4, 5): [5]}

so we want to take all key-value pairs, that have single values and re-create the arrays:

np.array([k+tuple(v) for k,v in d.iteritems() if len(v) == 1])

returns:

array([[4, 5, 5],
       [2, 3, 4]])

UPDATE 3: For larger arrays, you can adapt my previous solution to:

import numpy as np
input = [[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]]
d = {}
for a in input:
    d.setdefault(tuple(a[:2]), []).append(a)
np.array([v for v in d.itervalues() if len(v) == 1])

returns:

array([[[456,   6,   5, 343, 435,   5]],
       [[  1,   3,   4,   5,   6,   7]],
       [[  3,   4,   6,   7,   7,   6]],
       [[  3,   3,   3,   3,   3,   3]]])
like image 82
eumiro Avatar answered Dec 19 '25 05:12

eumiro



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!