Examples of org.apache.lucene.analysis.ja.JapaneseTokenizer$Position

org.apache.lucene.util.Vint8.Position
Tokenizer for Japanese that uses morphological analysis.
This tokenizer sets a number of additional attributes:
- {@link BaseFormAttribute} containing base form for inflectedadjectives and verbs.
- {@link PartOfSpeechAttribute} containing part-of-speech.
- {@link ReadingAttribute} containing reading and pronunciation.
- {@link InflectionAttribute} containing additional part-of-speechinformation for inflected forms.
This tokenizer uses a rolling Viterbi search to find the least cost segmentation (path) of the incoming characters. For tokens that appear to be compound (> length 2 for all Kanji, or > length 7 for non-Kanji), we see if there is a 2nd best segmentation of that token after applying penalties to the long tokens. If so, and the Mode is {@link Mode#SEARCH}, we output the alternate segmentation as well.

    discardPunctuation = getBoolean(DISCARD_PUNCTUATION, true);
  }
  
  @Override
  public Tokenizer create(Reader input) {
    return new JapaneseTokenizer(input, userDictionary, discardPunctuation, mode);
  }

View Full Code Here


  public Object extractCategoryTokenData(byte[] buffer, int offset, int length) {
    if (length == 0) {
      return null;
    }
    Integer i = Integer.valueOf(Vint8.decode(buffer, new Position(offset)));
    return i;
  }

View Full Code Here

    if (!super.setdoc(docId)) {
      return false;
    }


    // read header - number of enhancements and their lengths
    Position position = new Position();
    nEnhancements = Vint8.decode(buffer, position);
    for (int i = 0; i < nEnhancements; i++) {
      enhancementLength[i] = Vint8.decode(buffer, position);
    }

View Full Code Here


  public Object extractCategoryTokenData(byte[] buffer, int offset, int length) {
    if (length == 0) {
      return null;
    }
    Integer i = Integer.valueOf(Vint8.decode(buffer, new Position(offset)));
    return i;
  }

View Full Code Here

TOP

Related Classes of org.apache.lucene.analysis.ja.JapaneseTokenizer$Position

ca.uhn.fhir.model.primitive.DecimalDt

com.google.gdt.eclipse.designer.uibinder.model.widgets.WidgetInfo

com.sun.tools.classfile.TypeAnnotation.Position.TypePathEntry

org.apache.lucene.analysis.ja.dict.Dictionary

org.apache.lucene.analysis.ja.JapaneseTokenizerFactory

org.apache.lucene.facet.enhancements.association.AssociationEnhancement

org.apache.lucene.facet.enhancements.EnhancementsPayloadIterator

org.eclipse.wb.internal.core.model.presentation.DefaultObjectPresentation

org.eclipse.wb.internal.core.utils.xml.DocumentElement

org.eclipse.wb.internal.core.xml.model.association.DirectAssociation

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.