Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best data structure for searching through a collection

Currently I have a collection of hundreds of thousands of IDs as integers, and I am performing the following task (lets say this collection is stored in a list cache for now):

cache = list()
# lets say this cache is populated
for x in range(0,1000000):
    if x not in cache:
        #do_something

How expensive is it for me to use a list to search for not in something? Would I benefit from using another data structure, and if so which one would be best?

like image 687
JC1 Avatar asked Oct 28 '25 01:10

JC1


2 Answers

You should consider using a set. While it's worst case time complexity for x in cache would still be O(n), the average case is O(1) (source).

like image 196
Mureinik Avatar answered Oct 29 '25 18:10

Mureinik


If you could create the data in cache via a generator, you could forgo creating a list on the order of 1e5 and reap the memory savings. Then, you could simply test x not in with the following:

desired_id = 123456
cache = some_function_to_generate_integer_ids()  # cache is a generator
print desired_id in cache

in which False will be printed if desired_id is not in cache.

like image 21
call-in-co Avatar answered Oct 29 '25 17:10

call-in-co