Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to compare two strings in Spanish regardless of the accents in PHP?

My question is, given i have the following php code to compare two strings:

   $cadena1='JUAN LÓPEZ YÁÑEZ';
   $cadena2='JUAN LOPEZ YÁÑEZ';

   if($cadena1===$cadena2){
     echo '<p style="color: green;">The strings match!</p>';
   }else{
     echo '<p style="color: red;">The strings do not match. Accent sensitive?</p>';
   }

I notice for example that if I compare LOPEZ and LÓPEZ then the comparison turns to false.

Is there a way or a function already there to compare these strings regardless of the Spanish accents?

like image 611
Pathros Avatar asked Oct 23 '25 16:10

Pathros


2 Answers

The two strings compare to false because they are actually different sequence of bytes. To compare them, you need to normalize them in any way.

The best way to do that is to use the Transliterator class, part of the intl extension on PHP 5.4+.

A test code:

<?php
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);
$test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto'];
foreach($test as $e) {
    $normalized = $transliterator->transliterate($e);
    echo $e. ' --> '.$normalized."\n";
}
?>

Result:

abcd --> abcd
èe --> ee
€ --> €
àòùìéëü --> aouieeu
àòùìéëü --> aouieeu
tiësto --> tiesto

(taken from my answer here: mySQL - matching latin (english) form input to utf8 (non-English) data )

This replaces characters according to the tables of the ICU library, which are extremely complete and well-tested. Before transliterating, this normalizes the string, so it matches any possible way to represent characters like ñ (ñ, for example, can be represented with 1 multibyte character or as the combination of the two characters ˜ and n).

Unlike using soundex(), which is also very resource-intense, this does not compare sounds, so it's more accurate.

like image 107
ItalyPaleAle Avatar answered Oct 25 '25 06:10

ItalyPaleAle


I would replace all accents in your strings before comparing them. You can do that using the following code:

$replacements = array('Ó'=>'O', 'Á'=>'A', 'Ñ' => 'N'); //Add the remaining Spanish accents. 
$output = strtr("JUAN LÓPEZ YÁÑEZ",$replacements);

output will now be equal to cadena2.

like image 31
ltalhouarne Avatar answered Oct 25 '25 05:10

ltalhouarne