Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid capitalizing letters following unicode accents with regex [duplicate]

I've been looking for a solution to my regular expression problem for a few hours and days.

Here is an example of a string and I try to capitalize the first letters:

test-de'Maëly dUIJSENS

With /\b[a-zA-Z]/g

I manage to isolate the first letter well, but letters with accents cause me problems, and my result always gives a capital letter after an accented letter:

Test-De'MaëLy Duijsens

My expected result is as follows:

Test-De'Maëly Duijsens

Here's my attempt:

function testcapital (){
  var xxx = capitalizePhrase("test-de'Maëly dUIJSENS")
}

function capitalizePhrase(phrase) {
  var accentedCharacters = "àèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇߨøÅ寿œ";

  phrase  = phrase.toLowerCase()
  var reg = /\b[a-zA-Z]/g;
  function replace(firstLetters) {
    return firstLetters.toUpperCase();
  }
  capitalized = phrase.replace(reg, replace);
  return capitalized;
}

How can I prevent capitalization after the list of accented characters?

like image 259
djconcept Avatar asked Dec 07 '25 05:12

djconcept


1 Answers

You can put the unicode characters into a character class that can be used in a negative lookbehind:

const capitalizePhrase = phrase => {
  const accentedChars = "àèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇߨøÅ寿œ";
  const reg = new RegExp(`\\b(?<![${accentedChars}])([a-z])`, "g");
  return phrase.toLowerCase().replace(reg, m => m.toUpperCase());
};

console.log(capitalizePhrase("test-de'Maëly dUIJSENS"));
like image 113
ggorlen Avatar answered Dec 08 '25 20:12

ggorlen