Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java pattern matcher assignment

Tags:

java

regex

I need to create a program in which i user a pattern matcher do do the following. I know this needs a regex to be created. Please help me create a pattern for this:- If i have a paragraph i need to do the following:


a) if the number of letters in a word is greater than 3, then it should be changed to first three characters + $. For example, the word "Maradona" should be changed to "Mar$". If the number of letters is less than or equal to 3, leave it as it is. Numbers are not counted as letters. So
    "$16.9m." should not be changed.

b)Punctucation should be intact(Except if it is in the word. ie "tournament's" should be "tou$"). - Eg: history, -> his$,

c) "/n/n" is for line termination. It shouldn't be changed.

For :-

Maradona is the only footballer to set world-record contract fees twice, firstly when transferring to Barcelona for a then world record £5m, and secondly, when transferred to Napoli for another record fee $16.9m. During his professional club career Maradona played for Argentinos Juniors, Boca Juniors, Barcelona, Napoli, Sevilla and Newell's Old Boys. At club level, he is most famous for his career in Napoli where he won numerous accolades. In his international career, playing for Argentina, he earned 91 caps and scored 34 goals./n/n He played in four FIFA World Cup tournaments, including the 1986 tournament, where he captained Argentina and led them to their victory over West Germany in the final, winning the Golden Ball award as the tournament's best player. In that same tournament's quarterfinal round, he scored both goals in a 2–1 victory over England that entered football history, though for two different reasons. The first goal was via an unpenalized handball known as the /"Hand of God/", while the second goal followed a 60 m (66 yd) dribble past five England players./n/n/"The Goal of the Century/" awarded to Maradona by FIFA.com voters in 2002.

The output should be:-

Mar$ is the onl$ foo$ to set wor$ con$ fee$ twi$, fir$ whe$ tra$ to Bar$ for a the$ wor$ rec$ £5m, and sec$, whe$ tra$ to Nap$ for ano$ rec$ fee $16.9m. Dur$ his pro$ clu$ car$ Mar$ pla$ for Arg$ Jun$, Boc$ Jun$, Bar$, Nap$, Sev$ and New$ Old Boy$. At clu$ lev$, he is mos$ fam$ for his car$ in Nap$ whe$ he won num$ acc$. In his int$ car$, pla$ for Arg$, he ear$ 91 cap$ and sco$ 34 goa$./n/nHe pla$ in fou$ FIF$ Wor$ Cup tou$, inc$ the 1986 tou$, whe$ he cap$ Arg$ and led the$ to the$ vic$ ove$ Wes$ Ger$ in the fin$, win$ the Gol$ Bal$ awa$ as the tou$ bes$ pla$. In tha$ sam$ tou$ qua$ rou$, he sco$ bot$ goa$ in a 2–1 vic$ ove$ Eng$ tha$ ent$ foo$ his$, tho$ for two dif$ rea$. The fir$ goa$ was via an unp$ han$ kno$ as the /"Han$ of God/", whi$ the sec$ goa$ fol$ a 60 m (66 yd) dri$ pas$ fiv$ Eng$ pla$./n/n/"The Goa$ of the Cen$/" awa$ to Mar$ by FIF$ vot$ in 2002.

Edit This is what I have tried so far:

I was trying to do this by not using the pattern compile approach. Just using conditions like:

String[] split = sentence.split("\\s+");
for(int i = 0; i < split.length; i++)
{
    if(split[i].length() > 3)
    {
        if(split[i].matches("[a-zA-Z]+"))
    }
}

But this does not seem to be a valid approach.

like image 936
user3259926 Avatar asked Nov 25 '25 14:11

user3259926


2 Answers

This replaceAll should work:

String repl = data.replaceAll("(?<=\\b[a-zA-Z']{3})[\\w']+", "\\$");

Explanation: This regex finds 1 or more word characters that are preceded by "word boundary and 3 letters". Once found we replace this text by a literal $.

Search:

(?<=\b[a-zA-Z']{3}) Positive Lookbehind - Assert that the regex below can be matched
\b assert position at a word boundary
[a-zA-Z']{3} match a single character present, Exactly 3 times
a-z a single character in the range between a and z
A-Z a single character in the range between A and Z
' is literal single quote
[\w']+ matches any word character or single quote [`a-zA-Z0-9_] one or more time

Replacement:

\\$ - A literal $
like image 81
anubhava Avatar answered Nov 28 '25 02:11

anubhava


Try this:

str = str.replaceAll("(?i)(?<=\\s[a-z']{3})[a-z']+", "\\$");
like image 25
Bohemian Avatar answered Nov 28 '25 04:11

Bohemian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!