Examples of org.apache.lucene.analysis.ja.JapaneseTokenizer

Tokenizer for Japanese that uses morphological analysis.

This tokenizer sets a number of additional attributes:

{@link BaseFormAttribute} containing base form for inflectedadjectives and verbs.
{@link PartOfSpeechAttribute} containing part-of-speech.
{@link ReadingAttribute} containing reading and pronunciation.
{@link InflectionAttribute} containing additional part-of-speechinformation for inflected forms.

This tokenizer uses a rolling Viterbi search to find the least cost segmentation (path) of the incoming characters. For tokens that appear to be compound (> length 2 for all Kanji, or > length 7 for non-Kanji), we see if there is a 2nd best segmentation of that token after applying penalties to the long tokens. If so, and the Mode is {@link Mode#SEARCH}, we output the alternate segmentation as well.

Examples of org.apache.lucene.analysis.ja.JapaneseTokenizer

Related Classes of org.apache.lucene.analysis.ja.JapaneseTokenizer