Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Damerau–Levenshtein distance algorithm in MySQL as a function

Does anyone know of a MySQL implementation of the Damerau–Levenshtein distance algorithm as a stored procedure/function that takes a single specified string as a parameter and looks for fuzzy matches of the string in a particular field within a particular table?

I have found various procedure/function code examples that compares two specified strings and works out the distance, but firstly this is only the Levenshtein distance algorithm, and not the Damerau–Levenshtein one, and secondly, I'm not looking to compare two strings but find fuzzy matches in a field of my choosing that are similar to my specified string.

I'm basically trying to put together a fuzzy keyword searcher in MySQL.

like image 564
user1236443 Avatar asked Oct 25 '25 13:10

user1236443


1 Answers

This seems to be an old topic, however should anyone look for a MYSQL implementation of Damerau-Levenshtein distance, here is my own implementation (based upon a simple Levenshtein found elsewhere on this site), which works fine for strings less than 255 characters long. The third parameter can be set to FALSE to retrieve the basic Levenshtein distance:

CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255), dam BOOL)
RETURNS INT
DETERMINISTIC
BEGIN
    DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
    DECLARE s1_char, s2_char CHAR;
    -- max strlen=255
    DECLARE cv0, cv1, cv2 VARBINARY(256);
    SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
    IF s1 = s2 THEN
        RETURN 0;
    ELSEIF s1_len = 0 THEN
        RETURN s2_len;
    ELSEIF s2_len = 0 THEN
        RETURN s1_len;
    ELSE
        WHILE j <= s2_len DO
            SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
        END WHILE;
        WHILE i <= s1_len DO
            SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
            WHILE j <= s2_len DO
                SET c = c + 1;
                SET s2_char = SUBSTRING(s2, j, 1);
                IF s1_char = s2_char THEN
                    SET cost = 0; ELSE SET cost = 1;
                END IF;
                SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
                IF c > c_temp THEN SET c = c_temp; END IF;
                SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
                IF c > c_temp THEN SET c = c_temp; END IF;
                IF dam THEN
                    IF i>1 AND j>1 AND s1_char = SUBSTRING(s2, j-1, 1) AND s2_char = SUBSTRING(s1, i-1, 1) THEN
                        SET c_temp = CONV(HEX(SUBSTRING(cv2, j-1, 1)), 16, 10) + 1;
                        IF c > c_temp THEN SET c = c_temp; END IF;
                    END IF;
                END IF;
                SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
            END WHILE;
            IF dam THEN SET CV2 = CV1; END IF;
            SET cv1 = cv0, i = i + 1;
        END WHILE;
    END IF;
    RETURN c;
END
like image 145
docciveo Avatar answered Oct 28 '25 02:10

docciveo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!