Examples of org.apache.lucene.index.memory.MemoryIndex

org.apache.lucene.index.memory.MemoryIndex
.java.net/pub/a/today/2003/07/30/LuceneIntro.html">Lucene Analyzer Intro.
Arbitrary Lucene queries can be run against this class - see Lucene Query Syntax as well as Query Parser Rules. Note that a Lucene query selects on the field names and associated (indexed) tokenized terms, not on the original fulltext(s) - the latter are not stored but rather thrown away immediately after tokenization.
For some interesting background information on search technology, see Bob Wyman's Prospective Search, Jim Gray's A Call to Arms - Custom subscriptions, and Tim Bray's On Search, the Series.
Example Usage
```
 Analyzer analyzer = PatternAnalyzer.DEFAULT_ANALYZER; //Analyzer analyzer = new SimpleAnalyzer(); MemoryIndex index = new MemoryIndex(); index.addField("content", "Readings about Salmons and other select Alaska fishing Manuals", analyzer); index.addField("author", "Tales of James", analyzer); QueryParser parser = new QueryParser("content", analyzer); float score = index.search(parser.parse("+author:james +salmon~ +fish* manual~")); if (score > 0.0f) { System.out.println("it's a match"); } else { System.out.println("no match found"); } System.out.println("indexData=" + index.toString()); 
```
Example XQuery Usage
```
 (: An XQuery that finds all books authored by James that have something to do with "salmon fishing manuals", sorted by relevance :) declare namespace lucene = "java:nux.xom.pool.FullTextUtil"; declare variable $query := "+salmon~ +fish* manual~"; (: any arbitrary Lucene query can go here :) for $book in /books/book[author="James" and lucene:match(abstract, $query) > 0.0] let $score := lucene:match($book/abstract, $query) order by $score descending return $book 
```
No thread safety guarantees
An instance can be queried multiple times with the same or different queries, but an instance is not thread-safe. If desired use idioms such as:
```
 MemoryIndex index = ... synchronized (index) { // read and/or write index (i.e. add fields and/or query) }  
```
Performance Notes
Internally there's a new data structure geared towards efficient indexing and searching, plus the necessary support code to seamlessly plug into the Lucene framework.
This class performs very well for very small texts (e.g. 10 chars) as well as for large texts (e.g. 10 MB) and everything in between. Typically, it is about 10-100 times faster than RAMDirectory. Note that RAMDirectory has particularly large efficiency overheads for small to medium sized texts, both in time and space. Indexing a field with N tokens takes O(N) in the best case, and O(N logN) in the worst case. Memory consumption is probably larger than for RAMDirectory.
Example throughput of many simple term queries over a single MemoryIndex: ~500000 queries/sec on a MacBook Pro, jdk 1.5.0_06, server VM. As always, your mileage may vary.
If you're curious about the whereabouts of bottlenecks, run java 1.5 with the non-perturbing '-server -agentlib:hprof=cpu=samples,depth=10' flags, then study the trace log and correlate its hotspot trailer with its call stack headers (see hprof tracing ).

      if(wrapToCaching && !(tokenStream instanceof CachingTokenFilter)) {
        assert !cachedTokenStream;
        tokenStream = new CachingTokenFilter(new OffsetLimitTokenFilter(tokenStream, maxDocCharsToAnalyze));
        cachedTokenStream = true;
      }
      final MemoryIndex indexer = new MemoryIndex(true);
      indexer.addField(DelegatingAtomicReader.FIELD_NAME, tokenStream);
      tokenStream.reset();
      final IndexSearcher searcher = indexer.createSearcher();
      // MEM index has only atomic ctx
      internalReader = new DelegatingAtomicReader(((AtomicReaderContext)searcher.getTopReaderContext()).reader());
    }
    return internalReader.getContext();
  }

View Full Code Here

  }


  private IndexReader getReaderForField(String field) {
    IndexReader reader = (IndexReader) readers.get(field);
    if (reader == null) {
      MemoryIndex indexer = new MemoryIndex();
      indexer.addField(field, cachedTokenFilter);
      IndexSearcher searcher = indexer.createSearcher();
      reader = searcher.getIndexReader();
      readers.put(field, reader);
    }
    return reader;
  }

View Full Code Here

      tokenStream = new CachingTokenFilter(tokenStream);
      cachedTokenStream = true;
    }
    IndexReader reader = (IndexReader) readers.get(field);
    if (reader == null) {
      MemoryIndex indexer = new MemoryIndex();
      indexer.addField(field, tokenStream);
      tokenStream.reset();
      IndexSearcher searcher = indexer.createSearcher();
      reader = searcher.getIndexReader();
      readers.put(field, reader);
    }


    return reader;

View Full Code Here

      tokenStream = new CachingTokenFilter(new OffsetLimitTokenFilter(tokenStream, maxDocCharsToAnalyze));
      cachedTokenStream = true;
    }
    IndexReader reader = readers.get(field);
    if (reader == null) {
      MemoryIndex indexer = new MemoryIndex();
      indexer.addField(field, new OffsetLimitTokenFilter(tokenStream, maxDocCharsToAnalyze));
      tokenStream.reset();
      IndexSearcher searcher = indexer.createSearcher();
      reader = searcher.getIndexReader();
      readers.put(field, reader);
    }


    return reader;

View Full Code Here

        mltQuery.setMinimumShouldMatch("100%");
        mltQuery.setMinWordLen(0);
        mltQuery.setMinDocFreq(0);


        // one document has all values
        MemoryIndex index = new MemoryIndex();
        index.addField("name.first", "apache lucene", new WhitespaceAnalyzer());
        index.addField("name.last", "1 2 3 4", new WhitespaceAnalyzer());


        // two clauses, one for items and one for like_text if set
        BooleanQuery luceneQuery = (BooleanQuery) mltQuery.rewrite(index.createSearcher().getIndexReader());
        BooleanClause[] clauses = luceneQuery.getClauses();


        // check for items
        int minNumberShouldMatch = ((BooleanQuery) (clauses[0].getQuery())).getMinimumNumberShouldMatch();
        assertThat(minNumberShouldMatch, is(4));

View Full Code Here

            return likeTexts.toArray(Fields.EMPTY_ARRAY);
        }
    }


    private static Fields generateFields(String[] fieldNames, String text) throws IOException {
        MemoryIndex index = new MemoryIndex();
        for (String fieldName : fieldNames) {
            index.addField(fieldName, text, new WhitespaceAnalyzer());
        }
        return MultiFields.getFields(index.createSearcher().getIndexReader());
    }

View Full Code Here

    public void prepare(PercolateContext context, ParsedDocument parsedDocument) {
        IndexReader[] memoryIndices = new IndexReader[parsedDocument.docs().size()];
        List<ParseContext.Document> docs = parsedDocument.docs();
        int rootDocIndex = docs.size() - 1;
        assert rootDocIndex > 0;
        MemoryIndex rootDocMemoryIndex = null;
        for (int i = 0; i < docs.size(); i++) {
            ParseContext.Document d = docs.get(i);
            MemoryIndex memoryIndex;
            if (rootDocIndex == i) {
                // the last doc is always the rootDoc, since that is usually the biggest document it make sense
                // to reuse the MemoryIndex it uses
                memoryIndex = rootDocMemoryIndex = cache.get();
            } else {
                memoryIndex = new MemoryIndex(true);
            }
            memoryIndices[i] = indexDoc(d, parsedDocument.analyzer(), memoryIndex).createSearcher().getIndexReader();
        }
        MultiReader mReader = new MultiReader(memoryIndices, true);
        try {

View Full Code Here

    }


    private Fields generateTermVectors(Collection<GetField> getFields, boolean withOffsets, @Nullable Map<String, String> perFieldAnalyzer)
            throws IOException {
        /* store document in memory index */
        MemoryIndex index = new MemoryIndex(withOffsets);
        for (GetField getField : getFields) {
            String field = getField.getName();
            Analyzer analyzer = getAnalyzerAtField(field, perFieldAnalyzer);
            for (Object text : getField.getValues()) {
                index.addField(field, text.toString(), analyzer);
            }
        }
        /* and read vectors from it */
        return MultiFields.getFields(index.createSearcher().getIndexReader());
    }

View Full Code Here

        this.cache = cache;
    }


    @Override
    public void prepare(PercolateContext context, ParsedDocument parsedDocument) {
        MemoryIndex memoryIndex = cache.get();
        for (IndexableField field : parsedDocument.rootDoc().getFields()) {
            if (field.fieldType().indexOptions() == IndexOptions.NONE && field.name().equals(UidFieldMapper.NAME)) {
                continue;
            }
            try {
                // TODO: instead of passing null here, we can have a CTL<Map<String,TokenStream>> and pass previous,
                // like the indexer does
                TokenStream tokenStream = field.tokenStream(parsedDocument.analyzer(), null);
                if (tokenStream != null) {
                    memoryIndex.addField(field.name(), tokenStream, field.boost());
                }
            } catch (IOException e) {
                throw new ElasticsearchException("Failed to create token stream", e);
            }
        }

View Full Code Here

      tokenStream = new CachingTokenFilter(tokenStream);
      cachedTokenStream = true;
    }
    IndexReader reader = readers.get(field);
    if (reader == null) {
      MemoryIndex indexer = new MemoryIndex();
      indexer.addField(field, tokenStream);
      tokenStream.reset();
      IndexSearcher searcher = indexer.createSearcher();
      reader = searcher.getIndexReader();
      readers.put(field, reader);
    }


    return reader;

View Full Code Here

0 1 2

TOP

Related Classes of org.apache.lucene.index.memory.MemoryIndex

nux.xom.pool.FullTextPool

nux.xom.sandbox.MemoryIndexBenchmark

org.apache.lucene.analysis.hebrew.RealDataTest

org.apache.lucene.analysis.Token

org.apache.lucene.analysis.tokenattributes.CharTermAttribute

org.apache.lucene.analysis.tokenattributes.OffsetAttribute

org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute

org.apache.lucene.analysis.tokenattributes.TermAttribute

org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute

org.apache.lucene.analysis.TokenStream

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.

Examples of org.apache.lucene.index.memory.MemoryIndex

Example Usage

Example XQuery Usage

No thread safety guarantees

Performance Notes

Related Classes of org.apache.lucene.index.memory.MemoryIndex