I need to design a query which can support user specific document edits. The document below describes one way to store this data. The document below includes a root document Description property. The root document Description property should searched by all users, except for Eric and Alex. For Eric and Alex, the Description property has been customized, and a search query executed by either of those users should search their custom Description field data, within the nested UserData array. A search query executed by either Eric or Alex should not search the root document Description field.
For my use case, users may customize 0 or more root document properties. For any root document property which a user has customized, only the custom value for that property should be searched for that user.
The brute force method to solve this would be to index a separate copy of each customized document. I'm trying to avoid that, fearing that creating multiple copies of each document which a user has customized will unfairly weight the index, by duplicating document content which is not legitimately duplicated.
{
"Name": "doc1",
"Description": "Base description1",
"Spec": "Base document spec",
"UserData":[
{
"EnteredBy": "Eric",
"Description": "Desc entered by Eric, abc"
},
{
"EnteredBy": "Alex",
"Description": "Desc entered by Alex, def",
"Spec": "Spec entered by Alex"
}]
}
Edit 1
Below are listed the options I've considered.
Option 1: I could created a separate index for each user. In that index, I would add all of the base documents, which the user has not customized, and add each document which the user has customized. This would result in 1000+ indexes.
Option 2: I could use the script_score feature and manually compute the score for each document, using the override logic, described above. From what I've seen, the scoring logic would need to be primitive and may end up negating the power of Elasticsearch.
Edit 2
The solution will need to support a maximum of 40 fields and cases where any one field has been customized by up to 200 users. The index will contain 750,000 documents.
What about to create little bit different document structure, with nested fields, and add users to nested params? As example
POST /st_t2/_doc
{
"Name":"doc1",
"Description": [
{"base": "wtf"},
{"Alex": "Desc entered by Alex, aaa"}
],
"Spec": [
{"base": "Base document spec"}
]
}
And then you can create boolean queries like this:
GET st_t2/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"exists": {
"field": "Description.Eric"
}
},
{
"match": {
"Description.Eric": "wtf"
}
}
]
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "Description.Eric"
}
}
],
"must": [
{
"match": {
"Description.base": "wtf"
}
}
]
}
}
]
}
}
}
UPDATED:
During implementation of this solution @Eric Bowden came to decision to use nested mapping, and use provided exist and match inside nested fields. working example
Using input from @Oleksii Baidan, the following query worked for me. For the sample query below, a document is returned because user Eric has provided a custom value for field Description. If I were to modify the query below, to search for "abc", instead of "jkl", then the query would not return a result, as expected, because user Eric has overridden the Description field, hiding the base value of the description.
GET index1/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"nested": {
"path":"Description",
"query":{
"exists": {
"field": "Description.Eric"
}}}},
{
"nested": {
"path":"Description",
"query":{
"match": {
"Description.Eric": "jkl"
}}}}
]
}},
{
"bool": {
"must_not": [{
"nested": {
"path":"Description",
"query": {
"exists": {
"field": "Description.Eric"
}}}}
],
"must": [{
"nested": {
"path":"Description",
"query":{
"match": {
"Description.base": "jkl"
}}}}
]}}
]}}
}
Index definition.
PUT index1
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 0
},
"mappings": {
"properties" : {
"Name" : {
"type":"nested"
},
"Description" : {
"type":"nested"
},
"Spec" : {
"type":"nested"
}}}
}
Sample document
POST index1/_doc
{
"Name": [
{"base":"NameBase2"}
],
"Description": [
{"base": "DescriptionBase2 abc"},
{"Alex": "DescriptionAlex2 def"},
{"Eric": "DescriptionEric2 jkl"}
],
"Spec": [
{"base": "SpecBase2"}
]
}
Update Working with this further, I realized that it is not necessary to configure the user properties as nested. I'm leaving this SF post in place, as an example, but, from my understanding, configuring the user fields as nested is not necessary and provides no additional value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With