Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data structure for storing array of strings in a memory

I'm considering of data structure for storing a large array of strings in a memory. Strings will be inserted at the beginning of the programm and will not be added or deleted while programm is running. The crucial point is that search procedure should be as fast as it can be. Saving of memory is not important. I incline to standard structure hash_set from standard library, that allows to search elements in the structure with about constant time. But it's not guaranteed that this time will be short. Will anyone suggest a better standard desicion?

Many thanks!

like image 966
George Avatar asked Dec 21 '25 14:12

George


2 Answers

Try a Prefix Tree

A Trie is better than a Binary Search Tree for searching elements. Compared against a hash table, you could see this question

like image 104
Marco Avatar answered Dec 23 '25 09:12

Marco


If lookup time really is the only important thing, then at startup time, once you have all the strings, you could compute a perfect hash over them, and use this as the hashing function for a hashtable.

The problem is how you'd execute the hash - any kind of byte-code-based computation is probably going to be slower than using a fixed hash and dealing with collisions. But if all you care about is lookup speed, then you can require that your process has the necessary privileges to load and execute code. Write the code for the perfect hash, run it through a compiler, load it. Test at runtime whether it's actually faster for these strings than your best known data-agnostic structure (which might be a Trie, a hashtable, a Judy array or a splay tree, depending on implementation details and your typical access patterns), and if not fall back to that. Slow setup, fast lookup.

It's almost never truly the case that speed is the only crucial point.

like image 24
Steve Jessop Avatar answered Dec 23 '25 10:12

Steve Jessop