I have two lists, both contain integers represented as strings. large_list is large, up to low 10s of thousands of elements. small_list is smaller, up to hundreds of elements. I am filtering the larger one by the smaller one. I'm curious to know which is more efficient:
(a)
for element in small_list:
if element in large_list:
(do something)
(b)
for element in large_list:
if element in small_list:
(do something)
for option (a), the iteration count is limited by the length of small_list, but for each one, a search has to be executed on the full large_list.
for option (b), the iteration goes through the entire large_list but each search is executed on small_list.
I understand that sets are hashed, so the best option here ought to be to take large_set = set(large_list), iterate through small_list, and do each search on large_set. But if we take sets out of the equation, is option (a) more efficient, or is option (b)?
You can speculate, you can calculate
import timeit
small_list = list(range(10,20))
large_list = list(range(15,115))
def small_in_large():
c = 0
for element in small_list:
if element in large_list:
c += 1
return c
def large_in_small():
c = 0
for element in large_list:
if element in small_list:
c += 1
return c
t1 = timeit.timeit(small_in_large, globals=globals(), number=10000)
t2 = timeit.timeit(large_in_small, globals=globals(), number=10000)
print("small_in_large() {:.3f} seconds".format(t1))
print("large_in_small() {:.3f} seconds".format(t2))
On my machine:
small_in_large() 0.164 seconds
large_in_small() 0.730 seconds
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With