Examples of FSTLookup

org.apache.lucene.search.suggest.fst.FSTLookup
Finite state automata based implementation of {@link Lookup} query suggestion/ autocomplete interface.
Implementation details

The construction step in {@link #build(TermFreqIterator)} works as follows:
- A set of input terms (String) and weights (float) is given.
- The range of weights is determined and then all weights are discretized into a fixed set of values ( {@link #buckets}). Note that this means that minor changes in weights may be lost during automaton construction. In general, this is not a big problem because the "priorities" of completions can be split into a fixed set of classes (even as rough as: very frequent, frequent, baseline, marginal). If you need exact, fine-grained weights, use {@link TSTLookup} instead.
- All terms in the input are preprended with a synthetic pseudo-character being the weight of that term. For example a term abc with a discretized weight equal '1' would become 1abc.
- The terms are sorted by their raw value of utf16 character values (including the synthetic term in front).
- A finite state automaton ( {@link FST}) is constructed from the input. The root node has arcs labeled with all possible weights. We cache all these arcs, highest-weight first.
At runtime, in {@link #lookup(String,boolean,int)}, the automaton is utilized as follows:
- For each possible term weight encoded in the automaton (cached arcs from the root above), starting with the highest one, we descend along the path of the input key. If the key is not a prefix of a sequence in the automaton (path ends prematurely), we exit immediately. No completions.
- Otherwise, we have found an internal automaton node that ends the key. The entire subautomaton (all paths) starting from this node form the key's completions. We start the traversal of this subautomaton. Every time we reach a final state (arc), we add a single suggestion to the list of results (the weight of this suggestion is constant and equal to the root path we started from). The tricky part is that because automaton edges are sorted and we scan depth-first, we can terminate the entire procedure as soon as we collect enough suggestions the user requested.
- In case the number of suggestions collected in the step above is still insufficient, we proceed to the next (smaller) weight leaving the root node and repeat the same algorithm again.
Runtime behavior and performance characteristic

The algorithm described above is optimized for finding suggestions to short prefixes in a top-weights-first order. This is probably the most common use case: it allows presenting suggestions early and sorts them by the global frequency (and then alphabetically).
If there is an exact match in the automaton, it is returned first on the results list (even with by-weight sorting).
Note that the maximum lookup time for any prefix is the time of descending to the subtree, plus traversal of the subtree up to the number of requested suggestions (because they are already presorted by weight on the root level and alphabetically at any node level).
To order alphabetically only (no ordering by priorities), use identical term weights for all terms. Alphabetical suggestions are returned even if non-constant weights are used, but the algorithm for doing this is suboptimal.
"alphabetically" in any of the documentation above indicates utf16 codepoint order, nothing else.

Examples of org.apache.lucene.search.suggest.fst.FSTLookup


    boolean exactMatchFirst = params.get(EXACT_MATCH_FIRST) != null
    ? Boolean.valueOf(params.get(EXACT_MATCH_FIRST).toString())
    : true;


    return new FSTLookup(buckets, exactMatchFirst);
  }

View Full Code Here

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

        tf("four", 1),
        tf("fourty", 1),
        tf("xo", 1),
      };


      lookup = new FSTLookup();
      lookup.build(new TermFreqArrayIterator(keys));
  }

View Full Code Here

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

  }


  public void testMultilingualInput() throws Exception {
    List<TermFreq> input = LookupBenchmarkTest.readTop50KWiki();


    lookup = new FSTLookup();
    lookup.build(new TermFreqArrayIterator(input));


    for (TermFreq tf : input) {
      assertTrue("Not found: " + tf.term, lookup.get(tf.term) != null);
      assertEquals(tf.term, lookup.lookup(tf.term, true, 1).get(0).key);

View Full Code Here

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

      assertEquals(tf.term, lookup.lookup(tf.term, true, 1).get(0).key);
    }
  }


  public void testEmptyInput() throws Exception {
    lookup = new FSTLookup();
    lookup.build(new TermFreqArrayIterator(new TermFreq[0]));
    
    assertMatchEquals(lookup.lookup("", true, 10));
  }

View Full Code Here

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

    List<TermFreq> freqs = new ArrayList<TermFreq>();
    Random rnd = random;
    for (int i = 0; i < 5000; i++) {
      freqs.add(new TermFreq("" + rnd.nextLong(), rnd.nextInt(100)));
    }
    lookup = new FSTLookup();
    lookup.build(new TermFreqArrayIterator(freqs.toArray(new TermFreq[freqs.size()])));


    for (TermFreq tf : freqs) {
      final String term = tf.term;
      for (int i = 1; i < term.length(); i++) {

View Full Code Here

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

  private FSTLookup lookup;


  public void setUp() throws Exception {
    super.setUp();


    lookup = new FSTLookup();
    lookup.build(new TermFreqArrayIterator(evalKeys()));
  }

View Full Code Here

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

        "four/0",
        "fourblah/1",
        "fourteen/1",
        "fourier/0");


    lookup = new FSTLookup(10, false);
    lookup.build(new TermFreqArrayIterator(evalKeys()));
    
    // 'one' is not promoted after collecting two higher ranking results.
    assertMatchEquals(lookup.lookup("one", true, 2),  
        "oneness/1",

View Full Code Here

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

  }


  public void testMultilingualInput() throws Exception {
    List<TermFreq> input = LookupBenchmarkTest.readTop50KWiki();


    lookup = new FSTLookup();
    lookup.build(new TermFreqArrayIterator(input));


    for (TermFreq tf : input) {
      assertTrue("Not found: " + tf.term, lookup.get(_TestUtil.bytesToCharSequence(tf.term, random)) != null);
      assertEquals(tf.term.utf8ToString(), lookup.lookup(_TestUtil.bytesToCharSequence(tf.term, random), true, 1).get(0).key.toString());

View Full Code Here

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

      assertEquals(tf.term.utf8ToString(), lookup.lookup(_TestUtil.bytesToCharSequence(tf.term, random), true, 1).get(0).key.toString());
    }
  }


  public void testEmptyInput() throws Exception {
    lookup = new FSTLookup();
    lookup.build(new TermFreqArrayIterator(new TermFreq[0]));
    
    assertMatchEquals(lookup.lookup("", true, 10));
  }

View Full Code Here

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

    List<TermFreq> freqs = new ArrayList<TermFreq>();
    Random rnd = random;
    for (int i = 0; i < 5000; i++) {
      freqs.add(new TermFreq("" + rnd.nextLong(), rnd.nextInt(100)));
    }
    lookup = new FSTLookup();
    lookup.build(new TermFreqArrayIterator(freqs.toArray(new TermFreq[freqs.size()])));


    for (TermFreq tf : freqs) {
      final CharSequence term = _TestUtil.bytesToCharSequence(tf.term, random);
      for (int i = 1; i < term.length(); i++) {

View Full Code Here

0 1

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.

Examples of FSTLookup

Implementation details

Runtime behavior and performance characteristic

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

Examples of org.apache.lucene.search.suggest.fst.FSTLookup

Examples of org.apache.lucene.search.suggest.fst.FSTLookup