Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search for a string and similar word in a text?

I have to lookup for a word "age" and similar word in a text file.

I have following sentence :

  • 18 years of age
  • man aged 51
  • man ages between 25 to 50
  • between 5 to 75 years of age.(with dot)
  • between 5 to 75 years of age, (with comma)
  • agent name is xyz (agent contain age).

String.contains always return true in each case. My requirement is to pass the first five sentence and it return false in last case.

I will solve this problem by writing some code which contains a bunch of string " age ", " age." , "ages", "aged", " age," etc..

Is there any better way to solve this problem.

like image 746
Shashi Avatar asked Dec 05 '25 21:12

Shashi


2 Answers

If you use regex, you have to put all the possiblities.

string.matches("(?i).*\\bage[ds]?\\b.*");
like image 197
Avinash Raj Avatar answered Dec 08 '25 10:12

Avinash Raj


A naive solution (expensive) would be the following:

  1. tokenize each line (e.g., split by " ", or even non-alphanumeric characters, which already removes punctuation).
  2. calculate the edit distance of each word to the word age
  3. if the current word has a small edit distance (e.g., bellow 2), return line

The edit distance of two string is the number of edits (additions, deletions and replacements) that are required to make one string equal to the other. You can find an implementation of edit distance in the simmetrics library, or maybe elsewhere, too.

Another option could be to stem the words at step 2 and use contains with the stemming of the word age (also expensive).

If you already know all the acceptable answers (or at least their pattern), go for Avinash Raj's answer.

like image 39
vefthym Avatar answered Dec 08 '25 10:12

vefthym



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!