Examples of WordDelimiterFilter

org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter

---there, 'dude'" → "hello", "there", "dude"

trailing "'s" are removed for each subword: "O'Neil's" → "O", "Neil"

Note: this step isn't performed in a separate filter because of possible subword combinations.

combinations

combinations="0" causes no subword combinations: "PowerShot" → 0:"Power", 1:"Shot" (0 and 1 are the token positions)

combinations="1" means that in addition to the subwords, maximum runs of non-numeric subwords are catenated and produced at the same position of the last subword in the run:

"PowerShot" → 0:"Power", 1:"Shot" 1:"PowerShot"
"A's+B's&C's" -gt; 0:"A", 1:"B", 2:"C", 2:"ABC"
"Super-Duper-XL500-42-AutoCoder!" → 0:"Super", 1:"Duper", 2:"XL", 2:"SuperDuperXL", 3:"500" 4:"42", 5:"Auto", 6:"Coder", 6:"AutoCoder"

---there, 'dude'" -> "hello", "there", "dude" - trailing "'s" are removed for each subword - "O'Neil's" -> "O", "Neil" - Note: this step isn't performed in a separate filter because of possible subword combinations. The combinations parameter affects how subwords are combined: - combinations="0" causes no subword combinations. - "PowerShot" -> 0:"Power", 1:"Shot" (0 and 1 are the token positions) - combinations="1" means that in addition to the subwords, maximum runs of non-numeric subwords are catenated and produced at the same position of the last subword in the run. - "PowerShot" -> 0:"Power", 1:"Shot" 1:"PowerShot" - "A's+B's&C's" -> 0:"A", 1:"B", 2:"C", 2:"ABC" - "Super-Duper-XL500-42-AutoCoder!" -> 0:"Super", 1:"Duper", 2:"XL", 2:"SuperDuperXL", 3:"500" 4:"42", 5:"Auto", 6:"Coder", 6:"AutoCoder" One use for WordDelimiterFilter is to help match words with different subword delimiters. For example, if the source text contained "wi-fi" one may want "wifi" "WiFi" "wi-fi" "wi+fi" queries to all match. One way of doing so is to specify combinations="1" in the analyzer used for indexing, and combinations="0" (the default) in the analyzer used for querying. Given that the current StandardTokenizer immediately removes many intra-word delimiters, it is recommended that this filter be used after a tokenizer that does not do this (such as WhitespaceTokenizer). @version $Id: WordDelimiterFilter.java 1166766 2011-09-08 15:52:10Z rmuir $

Examples of WordDelimiterFilter

Examples of org.apache.solr.analysis.WordDelimiterFilter