Examples of Tokenizer

  • org.apache.jena.riot.tokens.Tokenizer
  • org.apache.lucene.analysis.Tokenizer
    A Tokenizer is a TokenStream whose input is a Reader.

    This is an abstract class.

    NOTE: subclasses must override {@link #incrementToken()} if the new TokenStream API is usedand {@link #next(Token)} or {@link #next()} if the oldTokenStream API is used.

    NOTE: Subclasses overriding {@link #incrementToken()} mustcall {@link AttributeSource#clearAttributes()} beforesetting attributes. Subclasses overriding {@link #next(Token)} must call{@link Token#clear()} before setting Token attributes.

  • org.apache.myfaces.trinidadinternal.el.Tokenizer
    converts a EL expression into tokens. @author The Oracle ADF Faces Team
  • org.apache.uima.lucas.indexer.Tokenizer
  • org.crsh.cli.impl.tokenizer.Tokenizer
  • org.eclipse.orion.server.cf.manifest.v2.Tokenizer
  • org.eclipse.osgi.framework.internal.core.Tokenizer
    Simple tokenizer class. Used to parse data.
  • org.exist.storage.analysis.Tokenizer
  • org.geoserver.ows.util.KvpUtils.Tokenizer
  • org.hsqldb.Tokenizer
    Provides the ability to tokenize SQL character sequences. Extensively rewritten and extended in successive versions of HSQLDB. @author Thomas Mueller (Hypersonic SQL Group) @version 1.8.0 @since Hypersonic SQL
  • org.jboss.dna.common.text.TokenStream.Tokenizer
  • org.jboss.forge.shell.command.parser.Tokenizer
    @author Lincoln Baxter, III
  • org.jstripe.tokenizer.Tokenizer
  • org.languagetool.tokenizers.Tokenizer
    Interface for classes that tokenize text into smaller units. @author Daniel Naber
  • org.modeshape.common.text.TokenStream.Tokenizer
  • org.openjena.riot.tokens.Tokenizer
  • org.radargun.utils.Tokenizer
    Tokenizer that allows string delims instead of char delims @author Radim Vansa <rvansa@redhat.com>
  • org.sonatype.maven.polyglot.atom.parsing.Tokenizer
    Taken from the Loop programming language compiler pipeline. @author dhanji@gmail.com (Dhanji R. Prasanna)
  • org.spoofax.jsglr.client.imploder.Tokenizer
  • org.supercsv_voltpatches.tokenizer.Tokenizer
    Reads the CSV file, line by line. If you want the line-reading functionality of this class, but want to define your own implementation of {@link #readColumns(List)}, then consider writing your own Tokenizer by extending AbstractTokenizer. @author Kasper B. Graversen @author James Bassett
  • org.zkoss.selector.lang.Tokenizer
    @author simonpai
  • weka.core.tokenizers.Tokenizer
    A superclass for all tokenizer algorithms. @author FracPete (fracpete at waikato dot ac dot nz) @version $Revision: 1.3 $

  • Examples of org.apache.lucene.analysis.Tokenizer

          final int min = _TestUtil.nextInt(random(), 2, 10);
          final int max = _TestUtil.nextInt(random(), min, 20);
          Analyzer a = new Analyzer() {
            @Override
            protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
              Tokenizer tokenizer = new MockTokenizer(reader, MockTokenizer.WHITESPACE, false);
              return new TokenStreamComponents(tokenizer,
                  new NGramTokenFilter(TEST_VERSION_CURRENT, tokenizer, min, max));
            }   
          };
          checkRandomData(random(), a, 200*RANDOM_MULTIPLIER, 20);
    View Full Code Here

    Examples of org.apache.lucene.analysis.Tokenizer

      public void testEmptyTerm() throws Exception {
        Random random = random();
        Analyzer a = new Analyzer() {
          @Override
          protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
            Tokenizer tokenizer = new KeywordTokenizer(reader);
            return new TokenStreamComponents(tokenizer,
                new NGramTokenFilter(TEST_VERSION_CURRENT, tokenizer, 2, 15));
          }   
        };
        checkAnalysisConsistency(random, a, random.nextBoolean(), "");
    View Full Code Here

    Examples of org.apache.lucene.analysis.Tokenizer

     
      public void testEmptyTerm() throws IOException {
        Analyzer a = new Analyzer() {
          @Override
          protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
            Tokenizer tokenizer = new KeywordTokenizer(reader);
            return new TokenStreamComponents(tokenizer, new ThaiWordFilter(TEST_VERSION_CURRENT, tokenizer));
          }
        };
        checkOneTerm(a, "", "");
      }
    View Full Code Here

    Examples of org.apache.lucene.analysis.Tokenizer

        EdgeNGramTokenizer tokenizer = new EdgeNGramTokenizer(TEST_VERSION_CURRENT, input, 1, 1);
        assertTokenStreamContents(tokenizer, new String[]{"a"}, new int[]{0}, new int[]{1}, 5 /* abcde */);
      }

      public void testBackUnigram() throws Exception {
        Tokenizer tokenizer = new Lucene43EdgeNGramTokenizer(Version.LUCENE_43, input, Lucene43EdgeNGramTokenizer.Side.BACK, 1, 1);
        assertTokenStreamContents(tokenizer, new String[]{"e"}, new int[]{4}, new int[]{5}, 5 /* abcde */);
      }
    View Full Code Here

    Examples of org.apache.lucene.analysis.Tokenizer

        EdgeNGramTokenizer tokenizer = new EdgeNGramTokenizer(TEST_VERSION_CURRENT, input, 1, 3);
        assertTokenStreamContents(tokenizer, new String[]{"a","ab","abc"}, new int[]{0,0,0}, new int[]{1,2,3}, 5 /* abcde */);
      }

      public void testBackRangeOfNgrams() throws Exception {
        Tokenizer tokenizer = new Lucene43EdgeNGramTokenizer(Version.LUCENE_43, input, Lucene43EdgeNGramTokenizer.Side.BACK, 1, 3);
        assertTokenStreamContents(tokenizer, new String[]{"e","de","cde"}, new int[]{4,3,2}, new int[]{5,5,5}, null, null, null, 5 /* abcde */, false);
      }
    View Full Code Here

    Examples of org.apache.lucene.analysis.Tokenizer

          final int max = _TestUtil.nextInt(random(), min, 20);
         
          Analyzer a = new Analyzer() {
            @Override
            protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
              Tokenizer tokenizer = new EdgeNGramTokenizer(TEST_VERSION_CURRENT, reader, min, max);
              return new TokenStreamComponents(tokenizer, tokenizer);
            }   
          };
          checkRandomData(random(), a, 100*RANDOM_MULTIPLIER, 20);
          checkRandomData(random(), a, 10*RANDOM_MULTIPLIER, 8192);
        }
       
        Analyzer b = new Analyzer() {
          @Override
          protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
            Tokenizer tokenizer = new Lucene43EdgeNGramTokenizer(Version.LUCENE_43, reader, Lucene43EdgeNGramTokenizer.Side.BACK, 2, 4);
            return new TokenStreamComponents(tokenizer, tokenizer);
          }   
        };
        checkRandomData(random(), b, 1000*RANDOM_MULTIPLIER, 20, false, false);
        checkRandomData(random(), b, 100*RANDOM_MULTIPLIER, 8192, false, false);
    View Full Code Here

    Examples of org.apache.lucene.analysis.Tokenizer

        checkRandomData(random(), b, 1000*RANDOM_MULTIPLIER, 20, false, false);
        checkRandomData(random(), b, 100*RANDOM_MULTIPLIER, 8192, false, false);
      }

      public void testTokenizerPositions() throws Exception {
        Tokenizer tokenizer = new Lucene43EdgeNGramTokenizer(Version.LUCENE_43, input, Lucene43EdgeNGramTokenizer.Side.FRONT, 1, 3);
        assertTokenStreamContents(tokenizer,
                                  new String[]{"a","ab","abc"},
                                  new int[]{0,0,0},
                                  new int[]{1,2,3},
                                  null,
    View Full Code Here

    Examples of org.apache.lucene.analysis.Tokenizer

        final SynonymMap map = b.build();

        final Analyzer analyzer = new Analyzer() {
          @Override
          protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
            Tokenizer tokenizer = new MockTokenizer(reader, MockTokenizer.SIMPLE, true);
            return new TokenStreamComponents(tokenizer, new SynonymFilter(tokenizer, map, false));
          }
        };

        assertAnalyzesTo(analyzer, "a b c",
    View Full Code Here

    Examples of org.apache.lucene.analysis.Tokenizer

        final SynonymMap map = b.build();

        final Analyzer analyzer = new Analyzer() {
          @Override
          protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
            Tokenizer tokenizer = new MockTokenizer(reader, MockTokenizer.SIMPLE, true);
            return new TokenStreamComponents(tokenizer, new SynonymFilter(tokenizer, map, false));
          }
        };

        assertAnalyzesTo(analyzer, "a b c",
    View Full Code Here

    Examples of org.apache.lucene.analysis.Tokenizer

          final boolean ignoreCase = random().nextBoolean();
         
          final Analyzer analyzer = new Analyzer() {
            @Override
            protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
              Tokenizer tokenizer = new MockTokenizer(reader, MockTokenizer.SIMPLE, true);
              return new TokenStreamComponents(tokenizer, new SynonymFilter(tokenizer, map, ignoreCase));
            }
          };

          checkRandomData(random(), analyzer, 100);
    View Full Code Here
    TOP
    Copyright © 2018 www.massapi.com. All rights reserved.
    All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.