Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter by external data not indexed in ElasticSearch

I can't find a way to do the following with ElasticSearch:

  • I have 2,000,000 items indexed in ElasticSearch
  • I have 30,000 players saved in MySQL

Every item has the name of a player as an attribute. The online status of these players changes every 15 minutes, and can be true or false (obviously).

I would like to be able to show only items for online players.

I don't think I can index the online status with the item, since it changes so often. I can't really get all the ids of the online players and use that as a filter since there are so many.

Would it help to index players in ElasticSearch as well? Is it possible to do some kind of JOIN with another index?

edit: After looking more into how doing joins with ES, I found out that it's actually possible with has_child if I index players in ES. Tire does not have a method for has_child, but is possible to do it with the existing DSL?

like image 577
Robin Avatar asked Jan 25 '26 03:01

Robin


1 Answers

Seems a good fit for a parent child relation between players and items, even if you don't need full text search on the parent documents, because:

  1. each item belongs to a player
  2. they have independent update lifecycles: when a player changes, you don't want to reindex all his items
  3. you only want to return the children, applying a filter to their parents.

You could index your players too, in the same index as the items but within a separate type. You need to declare in your mapping that the player type is parent of the item type:

{
  "item":{
    "_parent":{
      "type" : "player"
    }
  }
}

After that you index the players, then your items specifying the parent player id for each of them.

You can then execute a full text search on the items, filtering them using the following has_parent filter.

{
    "has_parent" : {
        "parent_type" : "player",
        "query" : {
            "term" : {
                "status" : true
            }
        }
    }
}

This way you would only query and eventually return the items that belong to an active player.

In order to update players you can use the update API and maybe use scripting to avoid resending the whole document. Beware that the document is going to be deleted and reindexed anyway under the hood, that's how lucene works.

If you want to see more examples about relations between documents in elasticsearch, have a look at the following articles:

  • Fun With Elasticsearch's Children and Nested Documents
  • Managing Relations in ElasticSearch

Depending on the type of queries that you are going to need you might encounter limitations, but given what you've written this is what I would do. Just make sure your nodes have enough memory, since elasticsearch keeps in memory a join table containing all the ids involved when using parent-child.

like image 184
javanna Avatar answered Jan 26 '26 20:01

javanna



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!