The aim is to find and remove a starting string/chars/word from an Arabic string that we don't know if it has diacritics or not but must maintain any and all diacritics of the remaining string (if any).
There are many answers for removing the first/starting string/chars from an English string on StackOverflow, but there is no existing solution to this problem found on StackOverflow that maintains the balance of the Arabic string in its original form.
If the original string is normalized (removing the diacritics, tanween, etc.) before processing it, then the remaining string returned will be the balance of the normalized string, not the remaining of the original string.
Example. Assume the following original string which can be in any of the following forms (i.e. the same string but different diacritics):
Now let us say we want to remove the first/staring characters "السلام" only if the string starts with such characters (which it does), and return the remaining of the "original" string with its original diacritics.
Of course, we are looking for the characters "السلام" without diacritics because we don't know how the original string is formatted with diacritics.
So, in this case, the returned remaining of each string must be:
The following code works for an English string (there are many other solutions) but not for an Arabic string as explained above.
function removeStartWord(string,word) {
if (string.startsWith(word)) string=string.slice(word.length);
return string;
}
The above code uses the principle of slicing the starting characters found from the original string based on the characters' length; which works fine for English text.
For an Arabic string, we don't know the form of diacritics of the original string and thus the length of the string/characters we are looking for in the original string will be different and unknown.
Edit: Added example image for better clarifications.
The following image table provides further examples:

To keep track of the discussion, I'm adding a new answer, try this please!
function removeStartWord(string, word) {
const alphabeticString = string.replace(/[^a-zA-Zء-ي0-9/]+/g, '');
if(!alphabeticString.startsWith(word)) return string;
const letters = [...word];
let cleanString = '';
string.split('').forEach((_letter) => {
if(letters.indexOf(_letter) > -1) {
delete letters[letters.indexOf(_letter)]
}else{
cleanString += _letter;
}
});
return cleanString.replace(/[^a-zA-Zء-ي0-9/\s]*/i, '');
}
const sampleData = `السَّلَامُ عَلَيْكُمُ وَرَحْمَةُ الله`;
console.log('sampleData ...', sampleData);
console.log(
"removeStartWord(sampleData, 'السلام') ...",
removeStartWord(sampleData, 'السلام')
);
console.log(
"removeStartWord(sampleData, 'الس') ...",
removeStartWord(sampleData, 'الس')
);
console.log(
"removeStartWord(sampleData, 'السلام ') ...",
removeStartWord(sampleData, 'السلام ')
);
console.log(
"removeStartWord(sampleData, ' السلام') ...",
removeStartWord(sampleData, ' السلام')
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
I have come up with the following as a possible solution.
The following solution is broken into 2 parts; firstly the function startsWithAr() is used to "partially" mimic the javascript startsWith() method but for an Arabic string.
However instead of returning 'true' or 'false', it will return the index after the characters we are looking for at the start of the Source String (i.e. the length of the string found in the Source String including its Tashkeel (diacritics) if any), otherwise, it returns -1 if the characters of the specified string are not found at the start of the string.
Using the startsWithAr() function, we then create (in the 2nd part) a function that removes the characters of the specified string if found at the start of the Source String using the slice() method; the removeStartString() function.
This approach permits not only maintaining the Tashkeel (diacritics) of the remainder of the Source String but also allows strings with Tahmeez to be searched and removed.
The function ignores Tashkeel (diacritics) and Tahmeez in both the Source String and the Look-For Search strings and will return the remaining part of the Source String with its original Tashkeel (diacritics) intact after removing the specified staring of characters from the beginning of the Source String.
This way we can use the function to handle all Unicode in the Arabic script not limit it to a defined range because any other characters of whatever language are ignored.
We can also improve it easily by matching "ه" with "ة" so we can remove the string "السيدة" even if it is written as "السيده" by adding .replace(/[ة]/g,'ه') at the 2 .replace() lines.
I have included below separate test cases on the use of the startsWithAr() function and the removeStartString() functions.
The two functions can be combined into one function if needed.
Please improve as necessary; any suggestions are appreciated.
//=====================================================================
// startsWithAr() function
// Purpose:
// Determines whether an Arabic string (the "Source String") begins with the characters
// of a specified string (the "Look-For String").
// Return the position (index) after the Look-For String if found, else return -1 if not found.
// Ignores Tashkeel (diacritics) and Tahmeez in both the Source and Look-For Strings.
// The returned position index is zero based.
// By knowing the position (index) after the Look-For String, one can remove the
// starting string using the slice() method while maintaining the remainder of the Source String with
// its original tashkeel (diacritics) unchanged.
//
// Parameters:
// str : The Source String to search in.
// lookFor : The characters to be searched for at the start of this string.
//=====================================================================
function startsWithAr(str,lookFor) {
let indexLookFor=0, tshk=/[ؐ-ًؕ-ٖٓ-ٟۖ-ٰٰۭـ]/, w=/[ؤ]/g,hamz=/[آأإٱٲٳٵ]/g;
lookFor=lookFor.replace(hamz,'ا').replace(w,'و').replace(/[ؐ-ًؕ-ٖٓ-ٟۖ-ٰٰۭـ]/g,''); // normalize the lookFor string
for (let indexStr=0; indexStr<str.length;indexStr++) {
while(tshk.test(str[indexStr])&&indexStr<str.length)++indexStr; // skip tashkeel & increase index
if (lookFor[indexLookFor]!==str[indexStr].replace(hamz,'ا').replace(w,'و')) return-1; // no match, so exit -1
indexLookFor++; // match found so next char in lookFor String
if (indexLookFor>=lookFor.length) { // if end of Source String then WE FOUND IT
indexStr+=1; // point after source char
while(tshk.test(str[indexStr])&&indexStr<str.length)++indexStr; // skip tashkeel after Source String if any
return indexStr; // return index in Source String after lookFor string and after any tashkeel
}
}
return-1; // not found end of string reached
}
//=========================================
// test cases for startsWithAr() function
//=========================================
var r =0; // test tracking flag
r |= test("السلام عَلَيَكُمُ ورحمة الله","السلام",6); // find the start letters 'السلام'
r |= test("الْسًّلامُ عَلَيَكُمُ ورحمة الله","السلام",10); // find the start letters 'السلام'
r |= test("الْسًّلامُ عَلَيَكُمُ وَرَحَمَةَ الله","السَّلام",10); // find the start letters 'السَّلام'
r |= test("ألْسًّلامُ عَلَيَكُمُ وَرَحَمَةَ الله","السَّلام",10); // find the start letters 'السَّلام'
r |= test("السؤال هو التالي","السوال",6); // find the start letters 'السوال'
r |= test("السيد/علي","السيد",5); // find the start letters 'السيد'
r |= test("السيد/علي","ف",-1); // find the start letters 'السيد'
r |= test(" السيد"," ",1); // find the start letter ' ' (space)
r |= test("المجد لنا","ال",2); // find the start letters 'ال'
r |= test("المجد لنا","ا",1); // find the start letter 'ا'
r |= test("ألمجد لنا","ال",2); // find the start letters 'ال'
r |= test("إلمجد لنا","ال",2); // find the start letters 'ال'
r |= test("إلمجد لنا","ألْ",2); // find the start letters 'ألْ'
r |= test("إلْمَجد لَنا","ألْ",3); // find the start letters 'ألْ'
r |= test("","ا",-1); // empty Source String
r |= test("","",-1); // empty Source String and Look-For String
if (r==0) console.log("✅ All startsWithAr() test cases passed");
//-----------------------------------
function test(str,lookfor,should) {
let result= startsWithAr(str,lookfor);
if (result !== should) {console.log(`
${str} Output :${result}
${str} Should be:${should}
`);return 1;}
}
//=====================================================================
// removeStartString() function
// Purpose:
// Determines whether an Arabic string (the "Source String") begins with the characters
// of a specified string (the "Look-For String").
// If found the Look-For String is removed and the reminder of the Source String is returned
// with its original Tashkeel (diacritics);
// If no match then return original Source String.
//
// Ignores Tashkeel (diacritics) and Tahmeez in both the Source and Look-For Strings.
// The function uses the startsWithAr() function to determine the index after the matched
// starting string/characters.
//
// Parameters:
// str : The Source String to search in.
// toRemove: The characters to be searched for and removed if at the start of this string.
//=====================================================================
function removeStartString(str,toRemove) {
let index=startsWithAr(str,toRemove);
if (index>-1) str=str.slice(index);
return str;
}
//=========================================
// test cases for removeStartString() function
//=========================================
var r =0; // test tracking flag
r |= test2("السلام عَلَيَكُمُ ورحمة الله","السلام"," عَلَيَكُمُ ورحمة الله"); // remove the start letters 'السلام'
r |= test2("ألْسًّلامُ عَلَيَكُمُ ورحمة الله","السلام"," عَلَيَكُمُ ورحمة الله"); // remove the start letters 'ألْسًّلامُ'
r |= test2("السلام عَلَيَكُمُ ورحمة الله","ألْسًّلامُ"," عَلَيَكُمُ ورحمة الله"); // remove the start letters 'ألْسًّلامُ'
r |= test2(" السلام عَلَيَكُمُ ورحمة الله"," ألْسًّلامُ"," عَلَيَكُمُ ورحمة الله");// remove the start letters 'ألْسًّلامُ '
r |= test2("السلام عَلَيَكُمُ ورحمة الله","ال","سلام عَلَيَكُمُ ورحمة الله"); // remove the start letters 'ال'
r |= test2("أَهْلًا وَسَهلًا","ا","هْلًا وَسَهلًا"); // remove the start letter 'ا' r |= test2("أَهْلًا وَسَهلًا"," ","أَهْلًا وَسَهلًا"); // remove the start letter ' '
r |= test2("أَهْلًا وَسَهلًا","","أَهْلًا وَسَهلًا"); // remove the start letter ''
r |= test2("أَهْلًا وَسَهلًا","إلى","أَهْلًا وَسَهلًا"); // remove the start letters 'إلى'
if (r==0) console.log("✅ All removeStartString() test cases passed");
//-----------------------------------
function startsWithAr(str,lookFor) {
let indexLookFor=0, tshk=/[ؐ-ًؕ-ٖٓ-ٟۖ-ٰٰۭـ]/, w=/[ؤ]/g,hamz=/[آأإٱٲٳٵ]/g;
lookFor=lookFor.replace(hamz,'ا').replace(w,'و').replace(/[ؐ-ًؕ-ٖٓ-ٟۖ-ٰٰۭـ]/g,'');
for (let indexStr=0; indexStr<str.length;indexStr++) {
while(tshk.test(str[indexStr])&&indexStr<str.length)++indexStr;
if (lookFor[indexLookFor]!==str[indexStr].replace(hamz,'ا').replace(w,'و')) return-1;
indexLookFor++;
if (indexLookFor>=lookFor.length) {
indexStr+=1;
while(tshk.test(str[indexStr])&&indexStr<str.length)++indexStr;
return indexStr;
}
}
return-1;
}
//-----------------------------------
function test2(str,toRemove,should) {
let result= removeStartString(str,toRemove);
if (result !== should) {console.log(`
${str} Output :${result}
${str} Should be:${should}
`);return 1;}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With