Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best synonym approach for elastic search?

I'm working on implementing a synonym query for colors in a product catalog using elastic search and I've been asking some consultants to implement it using the ES synonyms feature.

They tell me that a color might have hundreds of synonyms (white: ivory, creme, putty, etc) and that we should do the mapping in our operational database. I am not convinced. Would there really be huge performance hit if we had a list of, say, one hundred synonyms for white at query time? If that were slow, would doing the synonym mapping when indexing the documents obviate the problem?

The consultants want us to do the mapping in reverse, assigning a standard color to our items in our primary database and then pass that on to ES. I'd prefer not to have them learn anything about our architecture/infrastructure and just have them twiddle the knobs in ES which they already know how to do.

Am I naive in thinking we can proceed in this way? Is decorating or operational database with standard colors really the way to go?

like image 586
Robert Moskal Avatar asked Dec 05 '25 11:12

Robert Moskal


1 Answers

The way I'd do it is to define a file of synonyms, as described in the documentation here and maintain that file.

With this one I'd create my custom token filter and use them at indexing time. Probably not a huge performance hit if you'd do this at query time, but it's better to do it at indexing time. The response time at query time will be better.

Regarding your database, I don't know your architecture and I don't know why they say you need to put the synonyms there. As you see in the link I provided above, you can define a simple text file where you put something like:

ivory, creme, putty => white
...

This means that for any ivory, creme, putty found at indexing time, ES will actually index white and that's it.

And the analyzer would look like this:

       "analyzer" : {
            "synonym" : {
                "tokenizer" : "whitespace",
                "filter" : ["synonym"]
            }
        },
        "filter" : {
            "synonym" : {
                "type" : "synonym",
                "synonyms_path" : "analysis/synonym.txt"
            }
        }

But depending on what queries you want to run and what you need to match a query time, you can define an index_analyzer and a search_analyzer, use contraction or expansion so, for the "right" solution, more variables need to be looked at, not only what you mentioned. In my approach above, I basically made equal all the synonyms of "white" at indexing time. But, maybe you don't need this, given the queries you want to run.

In conclusion:

  • I don't see why the colors need to be held in a database, they can very well be specified in a text file, as you saw above. Maybe I don't have all the details of your use case.
  • The final solution might involve analyzing the input text from the query itself or analyzing the text at indexing time, or both. This all depends on your specific use case and your queries.
  • Test the various solutions on real data and real volume and compare the performance you get.
  • Usually, the synonyms approach is at indexing time, but not necessarily and it depends on the use case.
like image 88
Andrei Stefan Avatar answered Dec 07 '25 05:12

Andrei Stefan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!