Examples of MoreLikeThis

org.apache.lucene.queries.mlt.MoreLikeThis
source of doc you want to find similarities to Query query = mlt.like( target);
Hits hits = is.search(query); // now the usual iteration thru 'hits' - the only thing to watch for is to make sure //you ignore the doc if it matches your 'target' document, as it should be similar to itself

Thus you:
1. do your normal, Lucene setup for searching,
2. create a MoreLikeThis,
3. get the text of the doc you want to find similarities to
4. then call one of the like() calls to generate a similarity query
5. call the searcher to find the similar docs
More Advanced Usage

You may want to use {@link #setFieldNames setFieldNames(...)} so you can examinemultiple fields (e.g. body and title) for similarity.

Depending on the size of your index and the size and makeup of your documents you may want to call the other set methods to control how the similarity queries are generated:
- {@link #setMinTermFreq setMinTermFreq(...)}
- {@link #setMinDocFreq setMinDocFreq(...)}
- {@link #setMaxDocFreq setMaxDocFreq(...)}
- {@link #setMaxDocFreqPct setMaxDocFreqPct(...)}
- {@link #setMinWordLen setMinWordLen(...)}
- {@link #setMaxWordLen setMaxWordLen(...)}
- {@link #setMaxQueryTerms setMaxQueryTerms(...)}
- {@link #setMaxNumTokensParsed setMaxNumTokensParsed(...)}
- {@link #setStopWords setStopWord(...)}
```
 Changes: Mark Harwood 29/02/04 Some bugfixing, some refactoring, some optimisation. - bugfix: retrieveTerms(int docNum) was not working for indexes without a termvector -added missing code - bugfix: No significant terms being created for fields with a termvector - because was only counting one occurrence per term/field pair in calculations(ie not including frequency info from TermVector) - refactor: moved common code into isNoiseWord() - optimise: when no termvector support available - used maxNumTermsParsed to limit amount of tokenization 
```
org.apache.lucene.search.similar.MoreLikeThis
source of doc you want to find similarities to Query query = mlt.like( target); Hits hits = is.search(query); // now the usual iteration thru 'hits' - the only thing to watch for is to make sure //you ignore the doc if it matches your 'target' document, as it should be similar to itself Thus you:
1. do your normal, Lucene setup for searching,
2. create a MoreLikeThis,
3. get the text of the doc you want to find similarities to
4. then call one of the like() calls to generate a similarity query
5. call the searcher to find the similar docs
More Advanced Usage
You may want to use {@link #setFieldNames setFieldNames(...)} so you can examinemultiple fields (e.g. body and title) for similarity.
Depending on the size of your index and the size and makeup of your documents you may want to call the other set methods to control how the similarity queries are generated:
- {@link #setMinTermFreq setMinTermFreq(...)}
- {@link #setMinDocFreq setMinDocFreq(...)}
- {@link #setMaxDocFreq setMaxDocFreq(...)}
- {@link #setMaxDocFreqPct setMaxDocFreqPct(...)}
- {@link #setMinWordLen setMinWordLen(...)}
- {@link #setMaxWordLen setMaxWordLen(...)}
- {@link #setMaxQueryTerms setMaxQueryTerms(...)}
- {@link #setMaxNumTokensParsed setMaxNumTokensParsed(...)}
- {@link #setStopWords setStopWord(...)}
```
 Changes: Mark Harwood 29/02/04 Some bugfixing, some refactoring, some optimisation. - bugfix: retrieveTerms(int docNum) was not working for indexes without a termvector -added missing code - bugfix: No significant terms being created for fields with a termvector - because  was only counting one occurrence per term/field pair in calculations(ie not including frequency info from TermVector)  - refactor: moved common code into isNoiseWord() - optimise: when no termvector support available - used maxNumTermsParsed to limit amount of tokenization 
```

Examples of org.apache.lucene.search.similar.MoreLikeThis

      if( fields.length < 1 ) {
        throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
            "MoreLikeThis requires at least one similarity field: "+MoreLikeThisParams.SIMILARITY_FIELDS );
      }
      
      this.mlt = new MoreLikeThis( reader ); // TODO -- after LUCENE-896, we can use , searcher.getSimilarity() );
      mlt.setFieldNames(fields);
      mlt.setAnalyzer( searcher.getSchema().getAnalyzer() );
      
      // configurable params
      mlt.setMinTermFreq(       params.getInt(MoreLikeThisParams.MIN_TERM_FREQ,         MoreLikeThis.DEFAULT_MIN_TERM_FREQ));

View Full Code Here

Examples of org.apache.lucene.search.similar.MoreLikeThis

        this.analyzer=analyzer;
    }
    
    public Query rewrite(IndexReader reader) throws IOException
    {
        MoreLikeThis mlt=new MoreLikeThis(reader);
        
        mlt.setFieldNames(moreLikeFields);
        mlt.setAnalyzer(analyzer);
        mlt.setMinTermFreq(minTermFrequency);
        mlt.setMaxQueryTerms(maxQueryTerms);
        BooleanQuery bq= (BooleanQuery) mlt.like(new ByteArrayInputStream(likeText.getBytes()));        
        BooleanClause[] clauses = bq.getClauses();
        //make at least half the terms match
        bq.setMinimumNumberShouldMatch((int)(clauses.length*percentTermsToMatch));
        return bq;
    }

View Full Code Here

Examples of org.apache.lucene.search.similar.MoreLikeThis

        this.analyzer=analyzer;
    }
    
    public Query rewrite(IndexReader reader) throws IOException
    {
        MoreLikeThis mlt=new MoreLikeThis(reader);
        
        mlt.setFieldNames(moreLikeFields);
        mlt.setAnalyzer(analyzer);
        mlt.setMinTermFreq(minTermFrequency);
        if(minDocFreq>=0)
        {
          mlt.setMinDocFreq(minDocFreq);
        }        
        mlt.setMaxQueryTerms(maxQueryTerms);
        mlt.setStopWords(stopWords);
        BooleanQuery bq= (BooleanQuery) mlt.like(new ByteArrayInputStream(likeText.getBytes()));        
        BooleanClause[] clauses = bq.getClauses();
        //make at least half the terms match
        bq.setMinimumNumberShouldMatch((int)(clauses.length*percentTermsToMatch));
        return bq;
    }

View Full Code Here

Examples of org.apache.lucene.search.similar.MoreLikeThis

        this.analyzer=analyzer;
    }
    
    public Query rewrite(IndexReader reader) throws IOException
    {
        MoreLikeThis mlt=new MoreLikeThis(reader);
        
        mlt.setFieldNames(moreLikeFields);
        mlt.setAnalyzer(analyzer);
        mlt.setMinTermFreq(minTermFrequency);
        mlt.setMaxQueryTerms(maxQueryTerms);
        mlt.setStopWords(stopWords);
        BooleanQuery bq= (BooleanQuery) mlt.like(new ByteArrayInputStream(likeText.getBytes()));        
        BooleanClause[] clauses = bq.getClauses();
        //make at least half the terms match
        bq.setMinimumNumberShouldMatch((int)(clauses.length*percentTermsToMatch));
        return bq;
    }

View Full Code Here

Examples of org.apache.lucene.search.similar.MoreLikeThis

        //val filter = new FieldCacheTermsFilter(DBpediaResourceField.CONTEXT.toString,allowedUris)
        TermsFilter filter = new org.apache.lucene.search.TermsFilter();
        for (DBpediaResource u:  allowedUris) {
            filter.addTerm(new Term(DBpediaResourceField.URI.toString(),u.uri()) );
        }
        MoreLikeThis mlt = new MoreLikeThis(reader);
        String[] fields = {DBpediaResourceField.CONTEXT.toString()};
        mlt.setFieldNames(fields);
        mlt.setAnalyzer(this.mDefaultAnalyzer);
        InputStream inputStream = new ByteArrayInputStream(text.text().getBytes("UTF-8"));
        Query query = mlt.like(inputStream);
        return query;
    }

View Full Code Here

Examples of org.apache.lucene.search.similar.MoreLikeThis

    }
    
    @Override
    public Query rewrite(IndexReader reader) throws IOException
    {
        MoreLikeThis mlt=new MoreLikeThis(reader);
        
        mlt.setFieldNames(moreLikeFields);
        mlt.setAnalyzer(analyzer);
        mlt.setMinTermFreq(minTermFrequency);
        if(minDocFreq>=0)
        {
          mlt.setMinDocFreq(minDocFreq);
        }        
        mlt.setMaxQueryTerms(maxQueryTerms);
        mlt.setStopWords(stopWords);
        BooleanQuery bq= (BooleanQuery) mlt.like(new ByteArrayInputStream(likeText.getBytes()));        
        BooleanClause[] clauses = bq.getClauses();
        //make at least half the terms match
        bq.setMinimumNumberShouldMatch((int)(clauses.length*percentTermsToMatch));
        return bq;
    }

View Full Code Here

0 1 2

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.

Examples of MoreLikeThis

More Advanced Usage

More Advanced Usage

Examples of org.apache.lucene.search.similar.MoreLikeThis

Examples of org.apache.lucene.search.similar.MoreLikeThis

Examples of org.apache.lucene.search.similar.MoreLikeThis

Examples of org.apache.lucene.search.similar.MoreLikeThis

Examples of org.apache.lucene.search.similar.MoreLikeThis

Examples of org.apache.lucene.search.similar.MoreLikeThis