Using Lucene Analyzer Without Indexing - Is My Approach Reasonable?

Question

My objective is to leverage some of Lucene's many tokenizers and filters to transform input text, but without the creation of any indexes.

For example, given this (contrived) input string...

" Someone’s - [texté] goes here, foo . "

...and a Lucene analyzer like this...

Analyzer analyzer = CustomAnalyzer.builder()
        .withTokenizer("icu")
        .addTokenFilter("lowercase")
        .addTokenFilter("icuFolding")
        .build();

I want to get the following output:

someone's texte goes here foo

The below Java method does what I want.

But is there a better (i.e. more typical and/or concise) way that I should be doing this?

I am specifically thinking about the way I have used TokenStream and CharTermAttribute, since I have never used them like this before. Feels clunky.

Here is the code:

Lucene 8.3.0 imports:

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.custom.CustomAnalyzer;

My method:

private String transform(String input) throws IOException {

    Analyzer analyzer = CustomAnalyzer.builder()
            .withTokenizer("icu")
            .addTokenFilter("lowercase")
            .addTokenFilter("icuFolding")
            .build();

    TokenStream ts = analyzer.tokenStream("myField", new StringReader(input));
    CharTermAttribute charTermAtt = ts.addAttribute(CharTermAttribute.class);

    StringBuilder sb = new StringBuilder();
    try {
        ts.reset();
        while (ts.incrementToken()) {
            sb.append(charTermAtt.toString()).append(" ");
        }
        ts.end();
    } finally {
        ts.close();
    }
    return sb.toString().trim();
}

andrewJames · Accepted Answer

I have been using this set-up for a few weeks without issue. I have not found a more concise approach. I think the code in the question is OK.

Using Lucene Analyzer Without Indexing - Is My Approach Reasonable?

Tags:

java

lucene

andrewJames

1 Answers

andrewJames

Recent Activity

Donate For Us

Using Lucene Analyzer Without Indexing - Is My Approach Reasonable?

Tags:

java

lucene

andrewJames

1 Answers

andrewJames

Related questions

Recent Activity

Donate For Us