Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fuzzy string match in PowerShell

How can I do fuzzy string matching within PowerShell scripts?

I have different sets of names of people scraped from different sources and have them stored in an array. When I add a new name, I like to compare the name with existing name and if they fuzzily matches, I like to consider them to be the same. For example, with data set of:

@("George Herbert Walker Bush",
  "Barbara Pierce Bush",
  "George Walker Bush",
  "John Ellis (Jeb) Bush"  )

I like to see following outputs from the given input:

"Barbara Bush" -> @("Barbara Pierce Bush")
"George Takei" -> @("")
"George Bush"  -> @("George Herbert Walker Bush","George Walker Bush")

At minimum, I like to see matching to be case insensitive, and also flexible enough to handle some level of misspelling if possible.

As far as I can tell, standard libraries does not provide such functionalities. Is there an easy-to-install module which can accomplish this?

like image 606
hshib Avatar asked Oct 21 '25 15:10

hshib


1 Answers

Searching at PowerShell Gallery with term "fuzzy", I found this package: Communary.PASM.

It can be simply installed with:

PS> Install-Package Communary.PASM                                                                                                     

The project is found here in GitHub. I simply looked at this examples file for reference.

Here is my examples:

$colors = @("Red", "Orange", "Yellow", "Green", "Blue", "Violet", "Sky Blue" )

PS> $colors | Select-FuzzyString Red

Score Result
----- ------   
  300 Red

This is a perfect match, with 100 max score for each characters.

PS> $colors | Select-FuzzyString gren

Score Result
----- ------
  295 Green 

It tolerate a little missing characters.

PS> $colors | Select-FuzzyString blue

Score Result  
----- ------     
  400 Blue       
  376 Sky Blue

Multiple values can be returned with different scores.

PS> $colors | Select-FuzzyString vioret

# No output

But it does not tolerate a little bit of misspell. Then I also tried Select-ApproximateString:

PS> $colors | Select-ApproximateString vioret
Violet

This has different API that it only returns a single match or nothing. Also it may not return anything when Select-FuzzyString does.

This was tested with PowerShell Core v6.0.0-beta.9 on MacOS and Communary.PASM 1.0.43.

like image 184
hshib Avatar answered Oct 23 '25 07:10

hshib



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!