Examples of ICUNormalizer2Filter

Normalize token text with ICU's {@link com.ibm.icu.text.Normalizer2}

With this filter, you can normalize text in the following ways:

If you use the defaults, this filter is a simple way to standardize Unicode text in a language-independent way for search:

The case folding that it does can be seen as a replacement for LowerCaseFilter: For example, it handles cases such as the Greek sigma, so that "Μάϊος" and "ΜΆΪΟΣ" will match correctly.
The normalization will standardizes different forms of the same character in Unicode. For example, CJK full-width numbers will be standardized to their ASCII forms.
Ignorables such as Zero-Width Joiner and Variation Selectors are removed. These are typically modifier characters that affect display.

@see com.ibm.icu.text.Normalizer2 @see com.ibm.icu.text.FilteredNormalizer2