Given a pre-existing Python dictionary:
my_dict = {
"Robert" : 37,
"Kevin" : 25,
"Mark" : 81,
}
What would be the most efficient way of inserting a new value for an existing key in this structure, if you were only provided a sub-string or a different-case version of the key? E.g. "Rob" or "kev"
What I have presently works, but I'm doing something like the following, which is O(N^2) for multiple inputs:
for key in list(my_dict):
if my_input.name.lower() in key.lower():
my_dict[key] = my_input.value
break
First, as I explained in the comments under the question, there are a few things you can do to optimize your existing code:
my_dict = {
"Robert" : 37,
"Kevin" : 25,
"Mark" : 81,
"Andre" : 55
}
# optimized version of existing code
def update_key(my_dict, my_input, my_value):
my_input = my_input.lower()
for key in my_dict:
if my_input in key.lower():
my_dict[key] = my_value
break
As @Stephen Rauch did, I think building an index is the way to go. However, I think that building every possible substring of each key is costly. Furthermore, it's not robust to substrings that occur at the middle or the end of the string.
Here is an alternative index-based approach. The idea is that we concatenate all the keys together and build one string. We also keep track of the length of each string in a list.
Then for a given test string, we use str.find() (which should be O(N)) to find the index of the substring. Next we count off in the length array to find the corresponding key.
def create_index(my_dict):
key_str = ""
key_lengths = []
for key in my_dict:
key_str += key
key_lengths.append(len(key))
# return the concatenated keys, a lower case version, and the lengths list
return (key_str, key_str.lower(), key_lengths)
def key_lookup(key_str, key_str_lower, key_lengths, my_input):
idx = key_str_lower.find(my_input.lower())
if idx == -1:
return None
len_sum = 0
for kl in key_lengths:
if idx < len_sum+kl:
return key_str[len_sum:len_sum+kl]
else:
len_sum += kl
return None
key_str, key_str_lower, key_lengths = create_index(my_dict)
print key_str_lower
#andrerobertkevinmark
print key_lengths
#[5, 6, 5, 4]
Example output:
print(key_lookup(key_str, key_str_lower, key_lengths, my_input='rob'))
#Robert
print(key_lookup(key_str, key_str_lower, key_lengths, my_input='dre'))
#Andre
{offset: key}. This increases the index creation time slightly and uses more space, but the lookups would be faster.import random
import string
random.seed(12345)
N = 1000
my_dict = {
''.join(random.choice(string.ascii_letters) for _ in range(random.randint(3,30))): j for j in range(N)
}
def key_lookup_orig(my_dict, my_input):
my_input = my_input.lower()
for key in my_dict:
if my_input in key.lower():
return key
return None
subs = [k[random.randint(0,len(k)-1):max(random.randint(0,len(k)), len(k))] for k in my_dict]
%%timeit
sum([1 if not key_lookup_orig(my_dict, s) else 0 for s in subs])
#1000 loops, best of 3: 1.5 ms per loop
key_str, key_str_lower, key_lengths = create_index(my_dict)
%%timeit
sum([1 if not key_lookup(key_str, key_str_lower, key_lengths, my_input=s) else 0 for s in subs])
#1000 loops, best of 3: 795 µs per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With