Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge matches from two distinct (not sharded) Lucene Indexes

Tags:

lucene

I have two separate indexes holding different fields that together contain all the searchable fields for an index. For example the first index holds the indexed text for all documents, and the second holds tags for each and every document.

Note the example below is a bit wonky as I've changed the names of the entities. Index1: text document-id

Index2: tag-name: "very important" user: "Fred's id"

I would like to keep the indexes separate as it seems wasteful to continually update a single index whenever a user adds/removes a tag.

So far I think I might need to process the two search results and merge them manually (in code).Any other suggestions ?

I do not want to merge separate/sharded indexes.

like image 705
mP. Avatar asked Oct 14 '22 14:10

mP.


1 Answers

Lucene has a type of IndexReader to support this arrangement—ParallelReader.

It can be a little tricky to use, as the Lucene document identifier for a record must be the same in both indexes. In practice, this means adding documents in the same order to both indexes. I have read that in some cases, document deletion and index optimization can cause Lucene to reassign these document identifiers, but I haven't experimented to find out if this is true. Extra care may be needed if existing records are modified. If only new records are appended, there should be no trouble.

This approach is generally called "vertical partitioning," as opposed to "horizontal partitioning," or sharding.

like image 89
erickson Avatar answered Oct 20 '22 15:10

erickson