Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the first duplicate of an array

I've decided to learn python and I use CodeFight to train. The first Interview Practice exercice is about finding the first duplicate of an array and returning it or retruning -1 if there isn't any. This is the code I wrote:

def firstDuplicate(a):
b=[]
print(len(a))
for i in range(len(a)):
    if a[i] in b:
        return(a[i])
    elif a[i] not in b and i == (len(a)-1):
        return(-1)
    else:
        b.append(a[i])

I pass all the tests except the last 3. I says that my program takes longer than 4000ms to execute. I guess the array is very long and the duplicate is at the end. How can I reduce this execution time ? Thanks in advance.

like image 656
Pierre Trott Avatar asked Nov 20 '25 09:11

Pierre Trott


2 Answers

Your current solution uses a list to do book-keeping, the in membership test is always going to be linear in time. You should instead consider using a set, or keeping counts of values in a Counter.

The idea in both cases is not to iterate through the complete list if not necessary. With a little extra space, you improve performance.


def firstDuplicateSet(a):
    seen = set()
    for i in a:
        if i in seen:
            return i
        seen.add(i)

    return -1

from collections import Counter

def firstDuplicateCounter(a):
    c = Counter()
    for i in a:
        c[i] += 1
        if c[i] > 1:
            return i

    return -1
like image 101
cs95 Avatar answered Nov 22 '25 21:11

cs95


You could rewrite it this way:

def opt_first_duplicate(arr):
    unique_nb = set()
    for nb in arr:
        if nb in unique_nb:
            return nb
        else: 
            unique_nb.add(nb)
    return -1

I compared our solutions using %%timeit magic command in a jupyter notebook with a test array generated this way:

import random
test_array = [random.randint(0, 10000000) for i in range(5000)]

Example of one of the run:

firstDuplicate : 401 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

opt_first_duplicate: 600 µs ± 20.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

2 tricks that optimized your code:

  • Using a set instead of a list to look for already seen integers
  • The second test (elif) is useless since you already tested the presence of the integer

Hope it solves your problem

On sets faster than lists: What makes sets faster than lists in python?

like image 31
Mederic Fourmy Avatar answered Nov 22 '25 22:11

Mederic Fourmy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!