Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if a string contains all words of another string in the same order python?

I want to check if a string contains all of the substring's words and retains their order; at the moment I am using the following code; However it is very basic, seems inefficient and likely there is a much better way of doing it. I'd really appreciate if you could tell me what a more efficient solution would be. Sorry for a noob question, I am new to the programming and wasn't able to find a good solution

def check(main, sub_split):
    n=0
    while n < len(sub_split):
        result = True
        if sub_split[n] in main:
            the_start =  main.find(sub_split[n])
            main = main[the_start:]

        else:
            result=False
        n += 1
    return result

a = "I believe that the biggest castle in the world is Prague Castle "
b= "the biggest castle".split(' ')

print check(a, b)

update: interesting; First of all thank you all for your answers. Also thank you for pointing out some of the spots that my code missed. I have been trying different solutions posted here and in the links, I will add update how they compare and accept the answer then.

update: Again thank you all for great solutions, every one of them had major improvements compared to my code; I checked the suggestions with my requirements for 100000 checks and got the following results; suggestions by: Padraic Cunningham - consistently under 0.4 secs (though gives some false positives when searching for only full words; galaxyan - 0.65 secs; 0.75 secs friendly dog - 0.70 secs John1024 - 1.3 secs (Highly accurate, but seems to take extra time)

like image 979
temo Avatar asked Dec 07 '25 03:12

temo


1 Answers

You can simplify your search by passing the index of the previous match + 1 to find, you don't need to slice anything:

def check(main, sub_split):
    ind = -1
    for word in sub_split:
        ind = main.find(word, ind+1)
        if ind == -1:
            return False
    return True

a = "I believe that the biggest castle in the world is Prague Castle "
b= "the biggest castle".split(' ')

print check(a, b)

If ind is ever -1 then you get no match after so you return False, if you get thorough all the words then all words are in the string in order.

For exact words you could do something similar with lists:

def check(main, sub_split):
    lst, ind = main.split(), -1
    for word in sub_split:
        try:
           ind = lst.index(word, ind + 1)
        except ValueError:
            return False
    return True

And to handle punctuation, you could first strip it off:

from string import punctuation

def check(main, sub_split):
    ind = -1
    lst = [w.strip(punctuation) for w in main.split()]
    for word in (w.strip(punctuation) for w sub_split):
        try:
           ind = lst.index(word, ind + 1)
        except ValueError:
            return False
    return True

Of course some words are valid with punctuation but that is more a job for nltk or you may actually want to find matches including any punctuation.

like image 190
Padraic Cunningham Avatar answered Dec 08 '25 16:12

Padraic Cunningham



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!