Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching two or three words after Different Arabic Regex Patterns in Java

Tags:

java

regex

arabic

Greetings All;

I am a beginner in using regex. What I want to do is to extract 2 or 3 arabic words after a certain pattern.

for example:

If I have an arabic string

inputtext = "تكريم الدكتور احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية "

I need to extract the names after

الدكتور

and

والدكتورة

so the output shall be:

احمد زويل
سميرة موسى

what i have done so far is the following:

inputtext = "تكريم الدكتور احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية "
Pattern pattern = Pattern.compile("(?<=الدكتور).*");
            Matcher matcher = pattern.matcher(inputtext);
            boolean found = false;
            while (matcher.find()) {
                // Get the matching string
                String match = matcher.group();
                System.out.println("the match is: "+match);
                found = true;
            }
            if (!found)
    {
        System.out.println("I didn't found the text");
    }

but it returns:

احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية

I don't know how to add another pattern and how to stop after 2 words?

Would you please help me with any ideas?

like image 769
Daisy Avatar asked Dec 19 '25 21:12

Daisy


1 Answers

To match only the following two words try this one:

(?<=الدكتور)\s[^\s]+\s[^\s]+

.* will match everything till the end of the string so that is not what you want

\s is a whitespace character

[^\s] is a negated character group, that will match anything but a whitespace

So my solution will match a whitespace, then at least one non whitespace (the first word), then again a whitespace and once more at least one non whitespace (the second word).

To match your second pattern I would just do a second regex (just exchange the part inside the lookbehind) and match this pattern in a second step. The regular expression is easier to read that way.

Or you can try this

(?<=الدكتور)\s[^\s]+\s[^\s]+|(?<=والدكتورة)\s[^\s]+\s[^\s]+
like image 188
stema Avatar answered Dec 22 '25 10:12

stema



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!