Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find First missing number in Set

I need to find the first missing number from a HashSet, for example:

Set<Integer> h = new TreeSet<>(Arrays.asList(1, 2, 3, 4, 6, 8, 9, 10));

In this example if we are iterating for the first time we shold get int freeNumber = 5;

Obviously I can sort and iterate with while loop until I find a missing number. But it seems like not an optimized or elegant way of doing this operation.

int i = 1;
for (Integer number : allFileNumbers) {
    if(number != i) {
        missing = number;
        break;
    }
    i++;
}
like image 523
Suule Avatar asked Dec 28 '25 23:12

Suule


2 Answers

When you have a TreeSet, or any NavigableSet in general, you can use a variant of Binary Search to find the first missing value:

static Integer firstMissing(NavigableSet<Integer> set) {
    if(set.size() <= 1) return null;
    Integer first = set.first(), last = set.last();
    if(set.size() == last - first + 1) return null; // no gaps at all
    while(true) {
        int middle = (first+last)>>>1;
        NavigableSet<Integer> sub = set.headSet(middle, false);
        if(sub.size() < middle - first) {// gap before middle
            set = sub;
            last = sub.last();
        }
        else {
            set = set.tailSet(middle, true);
            first = set.first();
            if(first != middle) return middle;
        }
    }
}

to be called like

NavigableSet<Integer> set = new TreeSet<>(Arrays.asList(1, 2, 3, 4, 6, 7, 8, 9, 10));
System.out.println(firstMissing(set));

First, since a Set doesn’t contain duplicates, we can use the minimum and maximum number to calculate which size a set of consecutive numbers should have. If the set has that size, we know that there are no gaps and can return immediately. Otherwise, we calculate the middle number and split the set into halves. For the first half set, we can do the same test to determine whether it has a gap, to proceed only with that half set to find the first gap. Otherwise, we take the second half, already knowing that there must be a gap. The search is over when that set doesn’t contain our middle number.

If you have an arbitrary Set, without a guaranteed order, there is no best way to do it, as every approach works good for certain input, but worse for other.

  • You may simply copy the set to a TreeSet using new TreeSet<>(set) and use the method above

  • you may loop over the number range, to test for the presence, then the absence of numbers

        static Integer firstMissing(Set<Integer> set) {
            if(set.size() <= 1) return null;
            Integer firstPresent = null, firstAbsent = null;
            for(int i = Integer.MIN_VALUE; firstPresent == null; i++)
                if(set.contains(i)) firstPresent = i;
            for(int i = firstPresent+1; firstAbsent == null; i++)
                if(!set.contains(i)) firstAbsent = i;
            return firstAbsent-firstPresent == set.size()? null: firstAbsent;
        }
    

    The loop conditions take advantage of the pre-test which ensures that there are at least two numbers in the set.

    The obvious problem is the large number range, we have to probe. If we know that all numbers are positive, we may replace Integer.MIN_VALUE with zero.

  • you may loop over the set’s content, to record all encountered values in a searchable data structure. This is similar to the copying approach above, but e.g., if all numbers are positive, you may use the following test:

        static Integer firstMissing(Set<Integer> set) {
            if(set.size() <= 1) return null;
            BitSet bs = new BitSet();
            set.forEach(bs::set);
            int firstPresent = bs.nextSetBit(0), firstAbsent = bs.nextClearBit(firstPresent);
            return firstAbsent-firstPresent == set.size()? null: firstAbsent;
        }
    

    It works much better than TreeSet if there are only a few numbers missing or none at all, but much worse, if the values are really sparse.

like image 122
Holger Avatar answered Dec 30 '25 13:12

Holger


The question title indicates that the solution should not depend on the implementation of Set used. In that case, iterating on the values of the Set is not your best option: HashSet for instance does not guarantee an iteration following insertion order or natural order.

Your best option is to iterate on integers and check their existence in the set. It is a straightforward approach, and will run in O(k*p) where k is the smallest value not in the set and p is the cost of calling Set.contains(). If your set has O(1) read access, then you get a O(k) complexity algorithm, which is linear.

Example:

public int findFirstNotInSet(Set<Integer> values) {
    for (int i = 1; i < Integer.MAX_VALUE; i++) {
        if (!values.contains(i)) {
            return i;
        }
    }

    // handle edge case for Integer.MAX_VALUE here
}

If you can make more assumptions on the values in the set (range, number of missing values,...) then you can probably speed this algorithm up.

like image 35
Alex M Avatar answered Dec 30 '25 11:12

Alex M



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!