Let's say that two lists of elements are given, A and B. I'm interested in checking if A contains all the elements of B. Specifically, the elements must appear in the same order and they do not need to be consecutive. If this is the case, we say that B is a subsequence of A.
Here are some examples:
A = [4, 2, 8, 2, 7, 0, 1, 5, 3]
B = [2, 2, 1, 3]
is_subsequence(A, B) # True
A = [4, 2, 8, 2, 7, 0, 1, 5, 3]
B = [2, 8, 2]
is_subsequence(A, B) # True
A = [4, 2, 8, 2, 7, 0, 1, 5, 3]
B = [2, 1, 6]
is_subsequence(A, B) # False
A = [4, 2, 8, 2, 7, 0, 1, 5, 3]
B = [2, 7, 2]
is_subsequence(A, B) # False
I found a very elegant way to solve this problem (see this answer):
def is_subsequence(A, B):
it = iter(A)
return all(x in it for x in B)
I am now wondering how this solution behaves with possibly very very large inputs. Let's say that my lists contain billions of numbers.
The code you found creates an iterator for A; you can see this as a simple pointer to the next position in A to look at, and in moves the pointer forward across A until a match is found. It can be used multiple times, but only ever moves forward; when using in containment tests against a single iterator multiple times, the iterator can't go backwards and so can only test if still to visit values are equal to the left-hand operand.
Given your last example, with B = [2, 7, 2], what happens is this:
it = iter(A) creates an iterator object for the A list, and stores 0 as the next position to look at.all() function tests each element in an iterable and returns False early, if such a result was found. Otherwise it keeps testing every element. Here the tests are repeated x in it calls, where x is set to each value in B in turn.x is first set to 2, and so 2 in it is tested.
it is set to next look at A[0]. That's 4, not equal to 2, so the internal position counter is incremented to 1.A[1] is 2, and that's equal, so 2 in it returns True at this point, but not before incrementing the 'next position to look at' counter to 2.2 in it was true, so all() continues on.B is 7, so 7 in it is tested.
it is set to next look at A[2]. That's 8, not 7. The position counter is incremented to 3.it is set to next look at A[3]. That's 2, not 7. The position counter is incremented to 4.it is set to next look at A[4]. That's 7, equal to 7. The position counter is incremented to 5 and True is returned.7 in it was true, so all() continues on.B is 2, so 2 in it is tested.
it is set to next look at A[5]. That's 0, not 2. The position counter is incremented to 6.it is set to next look at A[6]. That's 1, not 2. The position counter is incremented to 7.it is set to next look at A[7]. That's 5, not 2. The position counter is incremented to 8.it is set to next look at A[8]. That's 3, not 2. The position counter is incremented to 9.A[9] because there are not that many elements in A, and so False is returned.2 in it was False, so all() ends by returning False.You could verify this with an iterator with a side effect you can observe; here I used print() to write out what the next value is for a given input:
>>> A = [4, 2, 8, 2, 7, 0, 1, 5, 3]
>>> B = [2, 7, 2]
>>> with_sideeffect = lambda name, iterable: (
print(f"{name}[{idx}] = {value}") or value
for idx, value in enumerate(iterable)
)
>>> is_sublist(with_sideeffect(" > A", A), with_sideeffect("< B", B))
< B[0] = 2
> A[0] = 4
> A[1] = 2
< B[1] = 7
> A[2] = 8
> A[3] = 2
> A[4] = 7
< B[2] = 2
> A[5] = 0
> A[6] = 1
> A[7] = 5
> A[8] = 3
False
Your problem requires that you test every element of B consecutively, there are no shortcuts here. You also must scan through A to test for the elements of B being present, in the right order. You can only declare victory when all elements of B have been found (partial scan), and defeat when all elements in A have been scanned and the current value in B you are testing for is not found.
So, assuming the size of B is always smaller than A, the best case scenario then is where all K elements in B are equal to the first K elements of A. The worst case, is any case where not all of the elements of B are present in A, and require a full scan through A. It doesn't matter what number of elements are present in B; if you are testing element K out of K you already have been scanning part-way through A and must complete your scan through A to find that the last element is missing.
So the best case with N elements in A and K elements in B, takes O(K) time. The worst case, using the same definitions of N and K, takes O(N) time.
There is no faster algorithm to test for this condition, so all you can hope for is lowering your constant times (the time taken to complete each of the N steps). Here that'd be a faster way to scan through A as you search for the elements of B. I am not aware of a better way to do this than by using the method you already found.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With