I have found various of solutions for this problem. One solutions is to install a MYSQL plugin called mysqlcft. However, this solution will not work since the web hosting of my current tiny project do not support MYSQL plugin. Any alternative solutions?
Tried to solve the problem by RandomSeed's suggestion "FULLTEXT and Asian Languages with MySQL 5.0". But the problem is that you cannot perform a 2 characters search unless you set "ft_min_word_len" to 2. Again, $1/month web hosting service do not allow you to do that.
Alright, spent 1 night to work out a work-around solution. The concept is to convert all non-alphabet characters of a UTF-8 string into some unique codes.
Here is the magic function. Borrowed from CSDN forum and made some changes.
function UTF2UCS($str, $s) {
$str = strtolower($str);
$char = 'UTF-8';
$arr = array();
$out = "";
$c = mb_strlen($str,$char);
$t = false;
for($i =0;$i<$c;$i++){
$arr[]=mb_substr($str,$i,1,$char);
}
foreach($arr as $i=>$v){
if(preg_match('/\w/i',$v,$match)){
$out .= $v;
$t = true;
}else{
if($t) $out .= " ";
if(isset($s) && $s) $out .= "+";
$out .= bin2hex(iconv("UTF-8","UCS-2",$v))." ";
$t = false;
}
}
return $out;
}
The result of
echo UTF2UCS("測試haha")
will be "6e2c 8a66 haha"
Say you have a field called "song_name". You just need to convert all song names by UTF2UCS function, then save those encrypted strings in a fulltext index field eg."song_name_ucs".
The next time you need to search something, all you need to do is:
$temp_string = UTF2UCS('測試', true);
SELECT * FROM song WHERE MATCH (song_name_ucs) AGAINST ('$temp_string')
Remember to put a true in UTF2UCS's second parameter when you need both '測' and '試' appears in the search result.
This might not be the best solution, but it does not requires any plugin or changes to the system. Pure PHP code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With