I have data which naturally fit into documents like
{
  "name": "Multi G. Enre",
  "books": [
    {
      "name": "Guns and lasers",
      "genre": "scifi",
      "publisher": "orbit"
    },
    {
      "name": "Dead in the night",
      "genre": "thriller",
      "publisher": "penguin"
    }
  ]
}
(the example is taken from a good review of nested and has_child documents)
In order to analyze them in Kibana and other software (a mix of legacy and lazyness), they are flattened:
{
  "name": "Multi G. Enre",
  "book_name": "Guns and lasers",
  "book_genre": "scifi",
  "book_publisher": "orbit"
}
{
  "name": "Multi G. Enre",
  "book_name": "Dead in the night",
  "book_genre": "thriller",
  "book_publisher": "penguin"
}
Beside the obvious growth of the size of the index, is there generally a performance impact of querying such flat records (the queries are of the type "writer with scifi books from penguin") versus nested ones, versus parent/child ones?
Querying the flat index will be much, MUCH better! The whole idea behind noSQL databases is to denormalize your data.
In your first example notice that you would need to update that record each time you add a book. That is a big no-no in ES/noSQL. ES records should be immutable. Behind the scenes updates are really delete+insert which is very expensive.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With