I have two separate indexes holding different fields that together contain all the searchable fields for an index. For example the first index holds the indexed text for all documents, and the second holds tags for each and every document.
Note the example below is a bit wonky as I've changed the names of the entities. Index1: text document-id
Index2: tag-name: "very important" user: "Fred's id"
I would like to keep the indexes separate as it seems wasteful to continually update a single index whenever a user adds/removes a tag.
So far I think I might need to process the two search results and merge them manually (in code).Any other suggestions ?
I do not want to merge separate/sharded indexes.
Lucene has a type of IndexReader
to support this arrangement—ParallelReader
.
It can be a little tricky to use, as the Lucene document identifier for a record must be the same in both indexes. In practice, this means adding documents in the same order to both indexes. I have read that in some cases, document deletion and index optimization can cause Lucene to reassign these document identifiers, but I haven't experimented to find out if this is true. Extra care may be needed if existing records are modified. If only new records are appended, there should be no trouble.
This approach is generally called "vertical partitioning," as opposed to "horizontal partitioning," or sharding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With