I want Lucene Scoring function to have no bias based on the length of the document. This is really a follow up question to Calculate the score only based on the documents have more occurance of term in lucene
I was wondering how Field.setOmitNorms(true) works? I see that there are two factors that make short documents get a high score:
Here is the documentation
I was wondering - if I wanted no bias towards shorter documents, is Field.setOmitNorms(true) enough?
Using BM25Similarity you could reduce to 0f:
@param b Controls to what degree document length normalizes tf values
or
@param k1 Controls non-linear term frequency normalization (saturation).
Both params will affect SimWeight
indexSearcher.setSimilarity(new BM25Similarity(1.2f,0f));
More explanation can be found here : http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With