Admittedly, I'm not that well versed on the analysis part of ES. Here's the index layout:
{
"mappings": {
"event": {
"properties": {
"ipaddress": {
"type": "string"
},
"hostname": {
"type": "string",
"analyzer": "my_analyzer",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
},
"settings": {
"analysis": {
"filter": {
"my_filter": {
"type": "word_delimiter",
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["lowercase", "my_filter"]
}
}
}
}
}
You can see that I've attempted to use a custom analyzer for the hostname field. This kind of works when I use this query to find the host named "WIN_1":
{
"query": {
"match": {
"hostname": "WIN_1"
}
}
}
The issue is that it also returns any hostname that has a 1 in it. Using the _analyze
endpoint, I can see that the numbers are tokenized as well.
{
"tokens": [
{
"token": "win_1",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 1
},
{
"token": "win",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 1
},
{
"token": "1",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 2
}
]
}
What I'd like to be able to do is search for WIN and get back any host that has WIN in it's name. But I also need to be able to search for WIN_1 and get back that exact host or any host with WIN_1 in it's name. Below is some test data.
{
"ipaddress": "192.168.1.253",
"hostname": "WIN_8_ENT_1"
}
{
"ipaddress": "10.0.0.1",
"hostname": "server1"
}
{
"ipaddress": "172.20.10.36",
"hostname": "ServA-1"
}
Hopefully someone can point me in the right direction. It could be that my simple query isn't the right approach either. I've poured over the ES docs, but they aren't real good with examples.
You could change your analysis to use a pattern analyzer that discards the digits and under scores:
{
"analysis": {
"analyzer": {
"word_only": {
"type": "pattern",
"pattern": "([^\p{L}]+)"
}
}
}
}
Using the analyze API:
curl -XGET 'localhost:9200/{yourIndex}/_analyze?analyzer=word_only&pretty=true' -d 'WIN_8_ENT_1'
returns:
"tokens" : [ {
"token" : "win",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
}, {
"token" : "ent",
"start_offset" : 6,
"end_offset" : 9,
"type" : "word",
"position" : 2
} ]
Your mapping would become:
{
"event": {
"properties": {
"ipaddress": {
"type": "string"
},
"hostname": {
"type": "string",
"analyzer": "word_only",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
You can use a multi_match query to get the results you want:
{
"query": {
"multi_match": {
"fields": [
"hostname",
"hostname.raw"
],
"query": "WIN_1"
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With