A {@link TokenFilter} that normalizes common katakana spelling variationsending in a long sound character by removing this character (U+30FC). Only katakana words longer than a minimum length are stemmed (default is four).
Note that only full-width katakana characters are supported. Please use a {@link org.apache.lucene.analysis.cjk.CJKWidthFilter} to convert half-widthkatakana to full-width before using this filter.
In order to prevent terms from being stemmed, use an instance of {@link org.apache.lucene.analysis.miscellaneous.KeywordMarkerFilter}or a custom {@link TokenFilter} that sets the {@link KeywordAttribute}before this {@link TokenStream}.