Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete duplicates from list but consider the type of elements and preserve order

Tags:

python

types

list

Task:

Develop a clean_list (list_to_clean) function, which takes 1 argument - a list of any values ​​(strings, integers, and floats) of any length, and returns a list that has the same values ​​but does not have duplicate items. This means that if there is a value in the original list in several instances, the first "instance" of the value remains in place, and the second, third, and so on are deleted.

Example:

Function call: clean_list ([32, 32.1, 32.0, -32, 32, '32']) Returns: [32, 32.1, 32.0, -32, '32']

My code:

def clean_list(list_to_clean):
   no_dubl_lst = [value for _, value in set((type(x), x) for x in list_to_clean)]
   return no_dubl_lst

print(clean_list([32, 32.1, 32.0, -32, 32, '32']))

Result:

[32.1, 32, -32, 32.0, '32']

But how i can restore original order?

like image 449
Kostya Avatar asked Dec 18 '25 21:12

Kostya


1 Answers

There are two concerns here, so for the purpose of an answer, I'll list both.

Respecting type (you already figured this out)

Removing duplicates in lists suggests constructing an intermediate set as the fastest method. An element is considered to be present in a set if it's equal to a present element.

In your case, you need not just the value, but also the type to be equal. So why not construct an intermediate set of tuples (value, type)?

unique_list = [v for v,t in {(v,type(v)) for v in orig_list}]

Preserving order

Use an "ordered set" container as per Does Python have an ordered set?. E.g.:

  • since 3.7 (and CPython 3.6 where this was an implementation detail), regular dicts preserve insertion order:

    unique_list = [v for v,t in dict.fromkeys((v,type(v)) for v in orig_list)]
    
  • for all versions (present in 3.6+, too, because it has additional methods), use collections.OrderedDict:

    import collections
    unique_list = [v for v,t in collections.OrderedDict.fromkeys((v,type(v)) for v in orig_list)]
    

For the reference, timeit results on my machine (3.7.4 win64) in comparison to other answers as of this writing:

In [24]: l=[random.choice((int,float,lambda v:str(int(v))))(random.random()*1000) for _ in range(100000)]

In [26]: timeit dict_fromkeys(l)        #mine
38.6 ms ± 179 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [34]: timeit ordereddict_fromkeys(l)  #mine with OrderedDict
53.3 ms ± 233 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [25]: timeit build_with_filter(l)    #Ch3steR's O(n)
48.7 ms ± 214 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [28]: timeit dict_with_none(l)       #Patrick Artner's
46.8 ms ± 377 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [30]: timeit listcompr_side_effect(l)  #CDJB's
55.5 ms ± 801 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
like image 88
ivan_pozdeev Avatar answered Dec 21 '25 11:12

ivan_pozdeev