I have to maintain a in-memory data-structure of Key-Value pair. I have following constraints:
void add_kv(void *ds, char *key, char *value);void del_kv(void *ds, char *key);char *lookup(void *ds, char *key);void *iterate(void *ds, char *prefix_key);My question is what is the best suited data-structure for above constraints?
I have considered hash. Add/delete/lookup could be done in o(1) as I have sufficient memory but it is not optimum for iteration. Hash-of-hash (hash on k1 then on k2 then on k3...) or array of hash could be done but it then violates Bullet 2. What other options do I have?
I would probably use something like a B+tree for this: https://en.wikipedia.org/wiki/B%2B_tree
Since memory-efficiency is important to you, when a leaf block gets full you should redistribute keys among several blocks if possible to ensure that blocks are always >= 85% full. Block size should be large enough that the overhead from internal nodes is only a few %.
You can also optimize storage in the leaf blocks, since most of the keys in a block will have a long common prefix that you can figure out from the blocks in the higher levels. You can therefore remove all the copies of the common prefix from the keys in the leaf blocks, and your 400MB of key-value pairs will take substantially less than 400MB of RAM. This will complicate the insert process somewhat.
There are other things you can do to compress this structure further, but that gets complicated fast and it doesn't sound like you need it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With