Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySql Fulltext Search not correct with short words

Tags:

sql

search

mysql

After many tries and many searches i came to the following query:

SELECT id,
       title,
       description,
       MATCH(title,description,tags) AGAINST ('$search' IN NATURAL LANGUAGE MODE) AS score
FROM pages

I use this query to search inside a large amounts of pages which contain music lessons. It works quite wel. Pages get a score based on how good they match to the query.

Except when the users search for something like "C Chord" or "Am Chord" the first part is ignored. If i search for "A chord" or "E chord" the page called "C chord" is always on top.

So.. Part 1 of my question is: how can i fix that?

Then part 2. Is it possible to give the column "Title" a bigger importance for the score then "Description"?

like image 849
PowerGuitarist Avatar asked Sep 16 '25 12:09

PowerGuitarist


2 Answers

MySQL has two important parameters for full text search, stop words and minimum word length. The first is the minimum word size (documented here):

  • Innodb: innodb_ft_min_token_size (default 3)
  • MyIsam: ft_min_word_len (default 4)

Words shorter than the minimum are not indexed, so you cannot search on them. Remember to rebuild the index after changing the parameter. Conveniently (hah!) they have different default values.

In addition, there are stop word lists to remove common stop words. Whether or not this is an issue depends on what words you are searching for. You can customize the stop words.

like image 162
Gordon Linoff Avatar answered Sep 18 '25 08:09

Gordon Linoff


  1. Question

This has been discussed on SO quite a few times: MySQL's built-in fulltext parser is designed for searching for words, not for single characters and comes with default minimum word length setting of 3 (innodb) or 4 (myisam) These settings mean that no words shorter than 3 or 4 characters get indexed and therefore will not be found by a fulltext search. You may lower the minimum character length limit to 1 and rebuild the index, but it will slow the searching down, since the indexes will be bigger.

  1. Question

It is possible, but you need to search on the title field separately and bump up the relevancy score results from the title field.

You can use union to get a combined list with sum() to sum the score up for any record:

SELECT p.id, any_value(title), any_value(description), any_value(tags), sum(t.score) as sum_score
FROM
    (SELECT id, (MATCH(title) AGAINST ('$search' IN NATURAL LANGUAGE MODE)) *2 AS score
     FROM pages
     UNION ALL
     SELECT id, MATCH(description,tags) AGAINST ('$search' IN NATURAL LANGUAGE MODE) AS score
     FROM pages) t
INNER JOIN pages p on t.id=p.id
GROUP BY p.id
ORDER BY sum(t.score) DESC

You need to adjust the fulltext indexes to be able to do the separate searches.

like image 35
Shadow Avatar answered Sep 18 '25 09:09

Shadow