Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python dictionary assignment by substring

Given a pre-existing Python dictionary:

my_dict = {
    "Robert" : 37,
    "Kevin"  : 25,
    "Mark"   : 81,
}

What would be the most efficient way of inserting a new value for an existing key in this structure, if you were only provided a sub-string or a different-case version of the key? E.g. "Rob" or "kev"

What I have presently works, but I'm doing something like the following, which is O(N^2) for multiple inputs:

for key in list(my_dict):
    if my_input.name.lower() in key.lower():
        my_dict[key] = my_input.value
        break
like image 615
Lovethenakedgun Avatar asked Feb 28 '26 01:02

Lovethenakedgun


1 Answers

First, as I explained in the comments under the question, there are a few things you can do to optimize your existing code:

my_dict = {
    "Robert" : 37,
    "Kevin"  : 25,
    "Mark"   : 81,
    "Andre"  : 55
}

# optimized version of existing code
def update_key(my_dict, my_input, my_value):
    my_input = my_input.lower()
    for key in my_dict:
        if my_input in key.lower():
            my_dict[key] = my_value
            break

As @Stephen Rauch did, I think building an index is the way to go. However, I think that building every possible substring of each key is costly. Furthermore, it's not robust to substrings that occur at the middle or the end of the string.

Here is an alternative index-based approach. The idea is that we concatenate all the keys together and build one string. We also keep track of the length of each string in a list.

Then for a given test string, we use str.find() (which should be O(N)) to find the index of the substring. Next we count off in the length array to find the corresponding key.

def create_index(my_dict):
    key_str = ""
    key_lengths = []
    for key in my_dict:
        key_str += key
        key_lengths.append(len(key))

    # return the concatenated keys, a lower case version, and the lengths list
    return (key_str, key_str.lower(), key_lengths)

def key_lookup(key_str, key_str_lower, key_lengths, my_input):
    idx = key_str_lower.find(my_input.lower())
    if idx == -1:
        return None
    len_sum = 0
    for kl in key_lengths:
        if idx < len_sum+kl:
            return key_str[len_sum:len_sum+kl]
        else:
            len_sum += kl
    return None

key_str, key_str_lower, key_lengths = create_index(my_dict)

print key_str_lower
#andrerobertkevinmark
print key_lengths
#[5, 6, 5, 4]

Example output:

print(key_lookup(key_str, key_str_lower, key_lengths, my_input='rob'))
#Robert
print(key_lookup(key_str, key_str_lower, key_lengths, my_input='dre'))
#Andre

Notes

  • I think your existing code might be better than this solution. It is definitely more readable, and easier to understand.
  • There are ways to modify this as well, depending on your actual purpose. For instance, instead of looping through and calculating the string offsets you could store each possible offset in a dictionary in the form of {offset: key}. This increases the index creation time slightly and uses more space, but the lookups would be faster.
  • As in the original example, this will return the first match in the case where the substring is contained in multiple strings. Since dictionaries are unordered, one may choose to sort the keys when building the index so that the output is deterministic.

Timing Results

import random
import string

random.seed(12345)

N = 1000
my_dict = {
    ''.join(random.choice(string.ascii_letters) for _ in range(random.randint(3,30))): j for j in range(N)
}

def key_lookup_orig(my_dict, my_input):
    my_input = my_input.lower()
    for key in my_dict:
        if my_input in key.lower():
            return key
    return None

subs = [k[random.randint(0,len(k)-1):max(random.randint(0,len(k)), len(k))] for k in my_dict]

%%timeit
sum([1 if not key_lookup_orig(my_dict, s) else 0 for s in subs])
#1000 loops, best of 3: 1.5 ms per loop

key_str, key_str_lower, key_lengths = create_index(my_dict)

%%timeit
sum([1 if not key_lookup(key_str, key_str_lower, key_lengths, my_input=s) else 0 for s in subs])
#1000 loops, best of 3: 795 µs per loop
like image 80
pault Avatar answered Mar 01 '26 15:03

pault



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!