Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Basic ElasticSearch + WordNet synonyms set-up with pre-existing index

I'm trying to learn how to properly add "synonym functionality" to my existing ElasticSearch set up. Here's what I understand so far about the process. I'd appreciate it if you could point out any misunderstandings I have - I'm very new to elasticsearch.

From this page I've learned that I need to add a synonym analyser and a synonym filter with a path to my synonyms file to my index config so that it looks like this:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "synonym" : {
                    "tokenizer" : "whitespace",
                    "filter" : ["synonym"]
                }
            },
            "filter" : {
                "synonym" : {
                    "type" : "synonym",
                    "format" : "wordnet",
                    "synonyms_path" : "analysis/wordnet_synonyms.txt"
                }
            }
        }
    }
}

From this page I've learned how to add an analyser:

curl -XPOST 'localhost:9200/myindex/_close'

curl -XPUT 'localhost:9200/myindex/_settings' -d '{
  "analysis" : {
    "analyzer":{
      "synonym":{
        "tokenizer":"whitespace",
        "filter" : ["synonym"]
      }
    }
  }
}'

curl -XPOST 'localhost:9200/myindex/_open'

But I don't know how to add the filter. Would it be as simple as this?:

curl -XPOST 'localhost:9200/myindex/_close'

curl -XPUT 'localhost:9200/myindex/_settings' -d '{
  "analysis" : {
    "filter":{
      "synonym":{
         "type" : "synonym",
         "format" : "wordnet",
         "synonyms_path" : "analysis/wordnet_synonyms.txt",
         "ignore_case" : true
      }
    }
  }
}'

curl -XPOST 'localhost:9200/myindex/_open'

I also don't know where the analysis/wordnet_synonyms.txt is relative to. On this page it says "relative to the config location". Where is the config location? In etc/elasticsearch somewhere (on Ubuntu)? Thanks!

Edit: This answer gives this as a solution:

curl -XPOST 'localhost:9200/myindex/_close'
curl -XPUT 'localhost:9200/myindex/_settings' -d '{
"analysis" : {
    "analyzer" : {
        "synonym" : {
            "tokenizer" : "whitespace",
            "filter" : ["synonym"]
        }
    },
    "filter":{
      "synonym":{
         "type" : "synonym",
         "format" : "wordnet",
         "synonyms_path" : "analysis/wordnet_synonyms.txt",
         "ignore_case" : true
      }
}'
curl -XPOST 'localhost:9200/myindex/_open'

Is this possible? A commenter said that the index would need to be recreated when changing analyser settings - is this true? And I'm still not sure where to put "wordnet_synonyms.txt".


1 Answers

The easiest way is to first delete your index and then create it with the analyzer and synonym token filter, like this (I've also added a mapping type and a dummy field to show you how to use your analyzer):

curl -XDELETE localhost:9200/myindex

curl -XPUT localhost:9200/myindex -d '{
  "settings": {
     "index" : {
        "analysis" : {
            "analyzer" : {
                "synonym" : {
                    "tokenizer" : "whitespace",
                    "filter" : ["synonym"]
                }
            },
            "filter" : {
                "synonym" : {
                    "type" : "synonym",
                    "format" : "wordnet",
                    "synonyms_path" : "analysis/wordnet_synonyms.txt"
                }
            }
        }
     }
  },
  "mappings": {
     "typename": {
        "fieldname": {
           "type": "string",
           "analyzer": "synonym"
        }
     }
  }
}'

You need to put the analysis/wordnet_synonyms.txt file in the same folder as your elasticsearch.yml configuration file. On Ubuntu, it would be in

/etc/elasticsearch/analysis/wordnet_synonyms.txt
like image 178
Val Avatar answered Nov 04 '25 18:11

Val