Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two strings by ignoring certain characters

Tags:

string

c#

I wonder if there is an easy way to check if two strings match by excluding certain characters in the strings. See example below.

I can easily write such a method by writing a regular expression to find the "wild card" characters, and replace them with a common character. Then compare the two strings str1 and str2. I am not looking for such implementations, but like to know whether there are any .Net framework classes that can take care of this. Seems like a common need, but I couldn't find any such method.

For example:

string str1 = "ABC-EFG";    
string str2 = "ABC*EFG";

The two strings must be declared equal.

Thanks!

like image 513
Mystic Avatar asked Sep 18 '25 09:09

Mystic


2 Answers

I found myself having the same requirements, the solution I used was based on the String.Compare method:

String.Compare(str1, str2, CultureInfo.InvariantCulture, CompareOptions.IgnoreSymbols)
like image 95
Johann Blais Avatar answered Sep 20 '25 01:09

Johann Blais


Not sure if this helps:

The Damerau-Levenshtein distance is one of several algorithms dealing with fuzzy string searching.

The DLD between "ABC-EFG" and "ABC*EFG" is 1—"the minimum number of operations needed to transform one string into the other, where an operation is defined as an insertion, deletion, or substitution of a single character, or a transposition of two characters."

Of course this algorithm would also return 1 for the two strings "ZBC-EFG" and "ABC-EFG"—possibly not what you are looking for.

An implementation of the DLD, in Python, from http://paxe.googlecode.com/svn/trunk/paxe/Lib/Installer.py :

def dist(s1, s2):
    d = {}
    lenstr1 = len(s1)
    lenstr2 = len(s2)
    for i in xrange(-1,lenstr1+1):
        d[(i,-1)] = i+1
    for j in xrange(-1,lenstr2+1):
        d[(-1,j)] = j+1

    for i in xrange(0,lenstr1):
        for j in xrange(0,lenstr2):
            if s1[i] == s2[j]:
                cost = 0
            else:
                cost = 1
            d[(i,j)] = min(
                d[(i-1,j)] + 1, # deletion
                d[(i,j-1)] + 1, # insertion
                d[(i-1,j-1)] + cost, # substitution
                )
            if i>1 and j>1 and s1[i]==s2[j-1] and s1[i-1] == s2[j]:
                d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition

    return d[lenstr1-1,lenstr2-1]
like image 41
mechanical_meat Avatar answered Sep 20 '25 01:09

mechanical_meat