Remove elements that appear more often than once from numpy array

Question

The question is, how can I remove elements that appear more often than once in an array completely. Below you see an approach that is very slow when it comes to bigger arrays. Any idea of doing this the numpy-way? Thanks in advance.

import numpy as np

count = 0
result = []
input = np.array([[1,1], [1,1], [2,3], [4,5], [1,1]]) # array with points [x, y]

# count appearance of elements with same x and y coordinate
# append to result if element appears just once

for i in input:
    for j in input:
        if (j[0] == i [0]) and (j[1] == i[1]):
            count += 1
    if count == 1:
        result.append(i)
    count = 0

print np.array(result)

UPDATE: BECAUSE OF FORMER OVERSIMPLIFICATION

Again to be clear: How can I remove elements appearing more than once concerning a certain attribute from an array/list ?? Here: list with elements of length 6, if first and second entry of every elements both appears more than once in the list, remove all concerning elements from list. Hope I'm not to confusing. Eumiro helped me a lot on this, but I don't manage to flatten the output list as it should be :(

import numpy as np 
import collections

input = [[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]]

# here, from input there should be removed input[0], input[1] and input[4] because
# first and second entry appears more than once in the list, got it? :)

d = {}

for a in input:
    d.setdefault(tuple(a[:2]), []).append(a[2:])

outputDict = [list(k)+list(v) for k,v in d.iteritems() if len(v) == 1 ]

result = []

def flatten(x):
    if isinstance(x, collections.Iterable):
        return [a for i in x for a in flatten(i)]
    else:
        return [x]

# I took flatten(x) from http://stackoverflow.com/a/2158522/1132378
# And I need it, because output is a nested list :(

for i in outputDict:
    result.append(flatten(i))

print np.array(result)

So, this works, but it's impracticable with big lists. First I got RuntimeError: maximum recursion depth exceeded in cmp and after applying sys.setrecursionlimit(10000) I got Segmentation fault how could I implement Eumiros solution for big lists > 100000 elements?

eumiro · Accepted Answer

np.array(list(set(map(tuple, input))))

returns

array([[4, 5],
       [2, 3],
       [1, 1]])

UPDATE 1: If you want to remove the [1, 1] too (because it appears more than once), you can do:

from collections import Counter

np.array([k for k, v in Counter(map(tuple, input)).iteritems() if v == 1])

returns

array([[4, 5],
       [2, 3]])

UPDATE 2: with input=[[1,1,2], [1,1,3], [2,3,4], [4,5,5], [1,1,7]]:

input=[[1,1,2], [1,1,3], [2,3,4], [4,5,5], [1,1,7]]

d = {}
for a in input:
    d.setdefault(tuple(a[:2]), []).append(a[2])

d is now:

{(1, 1): [2, 3, 7],
 (2, 3): [4],
 (4, 5): [5]}

so we want to take all key-value pairs, that have single values and re-create the arrays:

np.array([k+tuple(v) for k,v in d.iteritems() if len(v) == 1])

returns:

array([[4, 5, 5],
       [2, 3, 4]])

UPDATE 3: For larger arrays, you can adapt my previous solution to:

import numpy as np
input = [[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]]
d = {}
for a in input:
    d.setdefault(tuple(a[:2]), []).append(a)
np.array([v for v in d.itervalues() if len(v) == 1])

returns:

array([[[456,   6,   5, 343, 435,   5]],
       [[  1,   3,   4,   5,   6,   7]],
       [[  3,   4,   6,   7,   7,   6]],
       [[  3,   3,   3,   3,   3,   3]]])

Remove elements that appear more often than once from numpy array

Tags:

python

arrays

duplicates

numpy

UPDATE: BECAUSE OF FORMER OVERSIMPLIFICATION

feinmann

1 Answers

eumiro

Recent Activity

Donate For Us

Remove elements that appear more often than once from numpy array

Tags:

python

arrays

duplicates

numpy

UPDATE: BECAUSE OF FORMER OVERSIMPLIFICATION

feinmann

1 Answers

eumiro

Related questions

Recent Activity

Donate For Us