I'm trying to calculate the Levenshtein distance between two Pandas columns but I'm getting stuck Here is the library I'm using. Here is a minimal, reproducible example:
import pandas as pd
from textdistance import levenshtein
attempts = [['passw0rd', 'pasw0rd'],
            ['passwrd', 'psword'],
            ['psw0rd', 'passwor']]
df=pd.DataFrame(attempts, columns=['password', 'attempt'])
   password  attempt
0  passw0rd  pasw0rd
1   passwrd   psword
2    psw0rd  passwor
My poor attempt:
df.apply(lambda x: levenshtein.distance(*zip(x['password'] + x['attempt'])), axis=1)
This is how the function works. It takes two strings as arguments:
levenshtein.distance('helloworld', 'heloworl')
Out[1]: 2
The Levenshtein distance is usually calculated by preparing a matrix of size (M+1)x(N+1) —where M and N are the lengths of the 2 words—and looping through said matrix using 2 for loops, performing some calculations within each iteration.
Levenshtein distance between two strings is defined as the minimum number of characters needed to insert, delete or replace in a given string string1 to transform it to another string string2. Explanation : We can convert string1 into str2 by inserting a 's'.
Maybe I'm missing something, is there a reason you don't like the lambda expression? This works to me:
import pandas as pd
from textdistance import levenshtein
attempts = [['passw0rd', 'pasw0rd'],
            ['passwrd', 'psword'],
            ['psw0rd', 'passwor'],
            ['helloworld', 'heloworl']]
df=pd.DataFrame(attempts, columns=['password', 'attempt'])
df.apply(lambda x: levenshtein.distance(x['password'],  x['attempt']), axis=1)
out:
0    1
1    3
2    4
3    2
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With