Examples of WordDelimiterFilter

The combinations parameter affects how subwords are combined: One use for {@link WordDelimiterFilter} is to help match words with differentsubword delimiters. For example, if the source text contained "wi-fi" one may want "wifi" "WiFi" "wi-fi" "wi+fi" queries to all match. One way of doing so is to specify combinations="1" in the analyzer used for indexing, and combinations="0" (the default) in the analyzer used for querying. Given that the current {@link StandardTokenizer} immediately removes many intra-worddelimiters, it is recommended that this filter be used after a tokenizer that does not do this (such as {@link WhitespaceTokenizer}).
  • org.apache.solr.analysis.WordDelimiterFilter
    ---there, 'dude'" -> "hello", "there", "dude" - trailing "'s" are removed for each subword - "O'Neil's" -> "O", "Neil" - Note: this step isn't performed in a separate filter because of possible subword combinations. The combinations parameter affects how subwords are combined: - combinations="0" causes no subword combinations. - "PowerShot" -> 0:"Power", 1:"Shot" (0 and 1 are the token positions) - combinations="1" means that in addition to the subwords, maximum runs of non-numeric subwords are catenated and produced at the same position of the last subword in the run. - "PowerShot" -> 0:"Power", 1:"Shot" 1:"PowerShot" - "A's+B's&C's" -> 0:"A", 1:"B", 2:"C", 2:"ABC" - "Super-Duper-XL500-42-AutoCoder!" -> 0:"Super", 1:"Duper", 2:"XL", 2:"SuperDuperXL", 3:"500" 4:"42", 5:"Auto", 6:"Coder", 6:"AutoCoder" One use for WordDelimiterFilter is to help match words with different subword delimiters. For example, if the source text contained "wi-fi" one may want "wifi" "WiFi" "wi-fi" "wi+fi" queries to all match. One way of doing so is to specify combinations="1" in the analyzer used for indexing, and combinations="0" (the default) in the analyzer used for querying. Given that the current StandardTokenizer immediately removes many intra-word delimiters, it is recommended that this filter be used after a tokenizer that does not do this (such as WhitespaceTokenizer). @version $Id: WordDelimiterFilter.java 1166766 2011-09-08 15:52:10Z rmuir $

  • Examples of org.apache.solr.analysis.WordDelimiterFilter

                    code = WordDelimiterFilter.SUBWORD_DELIM;
                }
                tab[i] = code;
            }

            return new WordDelimiterFilter(tokenStream, tab,
                    generateWordParts, generateNumberParts,
                    catenateWords, catenateNumbers, catenateAll,
                    splitOnCaseChange, preserveOriginal,
                    splitOnNumerics, stemEnglishPossessive, protectedWords);
        }
    View Full Code Here
    TOP
    Copyright © 2018 www.massapi.com. All rights reserved.
    All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.