Examples of org.apache.lucene.queries.mlt.MoreLikeThis

org.apache.lucene.queries.mlt.MoreLikeThis
source of doc you want to find similarities to Query query = mlt.like( target);
Hits hits = is.search(query); // now the usual iteration thru 'hits' - the only thing to watch for is to make sure //you ignore the doc if it matches your 'target' document, as it should be similar to itself

Thus you:
1. do your normal, Lucene setup for searching,
2. create a MoreLikeThis,
3. get the text of the doc you want to find similarities to
4. then call one of the like() calls to generate a similarity query
5. call the searcher to find the similar docs
More Advanced Usage

You may want to use {@link #setFieldNames setFieldNames(...)} so you can examinemultiple fields (e.g. body and title) for similarity.

Depending on the size of your index and the size and makeup of your documents you may want to call the other set methods to control how the similarity queries are generated:
- {@link #setMinTermFreq setMinTermFreq(...)}
- {@link #setMinDocFreq setMinDocFreq(...)}
- {@link #setMaxDocFreq setMaxDocFreq(...)}
- {@link #setMaxDocFreqPct setMaxDocFreqPct(...)}
- {@link #setMinWordLen setMinWordLen(...)}
- {@link #setMaxWordLen setMaxWordLen(...)}
- {@link #setMaxQueryTerms setMaxQueryTerms(...)}
- {@link #setMaxNumTokensParsed setMaxNumTokensParsed(...)}
- {@link #setStopWords setStopWord(...)}
```
 Changes: Mark Harwood 29/02/04 Some bugfixing, some refactoring, some optimisation. - bugfix: retrieveTerms(int docNum) was not working for indexes without a termvector -added missing code - bugfix: No significant terms being created for fields with a termvector - because was only counting one occurrence per term/field pair in calculations(ie not including frequency info from TermVector) - refactor: moved common code into isNoiseWord() - optimise: when no termvector support available - used maxNumTermsParsed to limit amount of tokenization 
```

      id = Integer.parseInt(getString(docNum, "text"));
    } catch (NumberFormatException nfe) {
      errorMsg("Invalid document number");
      return;
    }
    MoreLikeThis mlt = new MoreLikeThis(ir);
    try {
      mlt.setFieldNames((String[])Util.fieldNames(ir, true).toArray(new String[0]));
    } catch (Exception e) {
      errorMsg("Exception collecting field names: " + e.toString());
      return;
    }
    mlt.setMinTermFreq(1);
    mlt.setMaxQueryTerms(50);
    Analyzer a = createAnalyzer(find("srchOptTabs"));
    if (a == null) {
      return;
    }
    mlt.setAnalyzer(a);
    Object[] rows = getSelectedItems(docTable);
    BooleanQuery similar = null;
    if (rows != null && rows.length > 0) {
      // collect text from fields
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < rows.length; i++) {
        Field f = (Field)getProperty(rows[i], "field");
        if (f == null) {
          continue;
        }
        String s = f.stringValue();
        if (s == null || s.trim().length() == 0) {
          continue;
        }
        if (sb.length() > 0) sb.append(" ");
        sb.append(s);
      }
      try {
        similar = (BooleanQuery)mlt.like("field", new StringReader(sb.toString()));
      } catch (Exception e) {
        e.printStackTrace();
        errorMsg("FAILED: " + e.getMessage());
        return;
      }
    } else {
      try {
        similar = (BooleanQuery)mlt.like(id);
      } catch (Exception e) {
        e.printStackTrace();
        errorMsg("FAILED: " + e.getMessage());
        return;
      }

View Full Code Here

   */
  @Override
  public void train(AtomicReader atomicReader, String textFieldName, String classFieldName, Analyzer analyzer, Query query) throws IOException {
    this.textFieldNames = new String[]{textFieldName};
    this.classFieldName = classFieldName;
    mlt = new MoreLikeThis(atomicReader);
    mlt.setAnalyzer(analyzer);
    mlt.setFieldNames(new String[]{textFieldName});
    indexSearcher = new IndexSearcher(atomicReader);
    this.query = query;
  }

View Full Code Here

   */
  @Override
  public void train(AtomicReader atomicReader, String[] textFieldNames, String classFieldName, Analyzer analyzer, Query query) throws IOException {
    this.textFieldNames = textFieldNames;
    this.classFieldName = classFieldName;
    mlt = new MoreLikeThis(atomicReader);
    mlt.setAnalyzer(analyzer);
    mlt.setFieldNames(textFieldNames);
    indexSearcher = new IndexSearcher(atomicReader);
    this.query = query;
  }

View Full Code Here

   */
  @Override
  public void train(AtomicReader atomicReader, String[] textFieldNames, String classFieldName, Analyzer analyzer, Query query) throws IOException {
    this.textFieldNames = textFieldNames;
    this.classFieldName = classFieldName;
    mlt = new MoreLikeThis(atomicReader);
    mlt.setAnalyzer(analyzer);
    mlt.setFieldNames(textFieldNames);
    indexSearcher = new IndexSearcher(atomicReader);
    if (minDocsFreq > 0) {
      mlt.setMinDocFreq(minDocsFreq);

View Full Code Here

   */
  @Override
  public void train(AtomicReader atomicReader, String textFieldName, String classFieldName, Analyzer analyzer) throws IOException {
    this.textFieldName = textFieldName;
    this.classFieldName = classFieldName;
    mlt = new MoreLikeThis(atomicReader);
    mlt.setAnalyzer(analyzer);
    mlt.setFieldNames(new String[]{textFieldName});
    indexSearcher = new IndexSearcher(atomicReader);
  }

View Full Code Here

   */
  @Override
  public void train(AtomicReader atomicReader, String[] textFieldNames, String classFieldName, Analyzer analyzer, Query query) throws IOException {
    this.textFieldNames = textFieldNames;
    this.classFieldName = classFieldName;
    mlt = new MoreLikeThis(atomicReader);
    mlt.setAnalyzer(analyzer);
    mlt.setFieldNames(textFieldNames);
    indexSearcher = new IndexSearcher(atomicReader);
    if (minDocsFreq > 0) {
      mlt.setMinDocFreq(minDocsFreq);

View Full Code Here

   */
  @Override
  public void train(AtomicReader atomicReader, String textFieldName, String classFieldName, Analyzer analyzer) throws IOException {
    this.textFieldName = textFieldName;
    this.classFieldName = classFieldName;
    mlt = new MoreLikeThis(atomicReader);
    mlt.setAnalyzer(analyzer);
    mlt.setFieldNames(new String[]{textFieldName});
    indexSearcher = new IndexSearcher(atomicReader);
  }

View Full Code Here

   */
  @Override
  public void train(AtomicReader atomicReader, String[] textFieldNames, String classFieldName, Analyzer analyzer, Query query) throws IOException {
    this.textFieldNames = textFieldNames;
    this.classFieldName = classFieldName;
    mlt = new MoreLikeThis(atomicReader);
    mlt.setAnalyzer(analyzer);
    mlt.setFieldNames(textFieldNames);
    indexSearcher = new IndexSearcher(atomicReader);
    if (minDocsFreq > 0) {
      mlt.setMinDocFreq(minDocsFreq);

View Full Code Here

 */
public class MoreLikeThisHelper {


    public static Query getMoreLikeThis(IndexReader reader, Analyzer analyzer, String mltQueryString) {
        Query moreLikeThisQuery = null;
        MoreLikeThis mlt = new MoreLikeThis(reader);
        mlt.setAnalyzer(analyzer);
        try {
            String text = null;
            String[] fields = {};
            for (String param : mltQueryString.split("&")) {
                String[] keyValuePair = param.split("=");
                if (keyValuePair.length != 2 || keyValuePair[0] == null || keyValuePair[1] == null) {
                    throw new RuntimeException("Unparsable native Lucene MLT query: " + mltQueryString);
                } else {
                    if ("stream.body".equals(keyValuePair[0])) {
                        text = keyValuePair[1];
                    } else if ("mlt.fl".equals(keyValuePair[0])) {
                        fields = keyValuePair[1].split(",");
                    } else if ("mlt.mindf".equals(keyValuePair[0])) {
                        mlt.setMinDocFreq(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.mintf".equals(keyValuePair[0])) {
                        mlt.setMinTermFreq(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.boost".equals(keyValuePair[0])) {
                        mlt.setBoost(Boolean.parseBoolean(keyValuePair[1]));
                    } else if ("mlt.qf".equals(keyValuePair[0])) {
                        mlt.setBoostFactor(Float.parseFloat(keyValuePair[1]));
                    } else if ("mlt.maxdf".equals(keyValuePair[0])) {
                        mlt.setMaxDocFreq(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.maxdfp".equals(keyValuePair[0])) {
                        mlt.setMaxDocFreqPct(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.maxntp".equals(keyValuePair[0])) {
                        mlt.setMaxNumTokensParsed(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.maxqt".equals(keyValuePair[0])) {
                        mlt.setMaxQueryTerms(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.maxwl".equals(keyValuePair[0])) {
                        mlt.setMaxWordLen(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.minwl".equals(keyValuePair[0])) {
                        mlt.setMinWordLen(Integer.parseInt(keyValuePair[1]));
                    }
                }
            }
            if (text != null) {
                if (FieldNames.PATH.equals(fields[0])) {
                    IndexSearcher searcher = new IndexSearcher(reader);
                    TermQuery q = new TermQuery(new Term(FieldNames.PATH, text));
                    TopDocs top = searcher.search(q, 1);
                    if (top.totalHits == 0) {
                        mlt.setFieldNames(fields);
                        moreLikeThisQuery = mlt.like(new StringReader(text), mlt.getFieldNames()[0]);
                    } else{
                        ScoreDoc d = top.scoreDocs[0];
                        Document doc = reader.document(d.doc);
                        List<String> fieldNames = new ArrayList<String>();
                        for (IndexableField f : doc.getFields()) {
                            if (!FieldNames.PATH.equals(f.name())) {
                                fieldNames.add(f.name());
                            }
                        }
                        String[] docFields = fieldNames.toArray(new String[0]);
                        mlt.setFieldNames(docFields);
                        moreLikeThisQuery = mlt.like(d.doc);
                    }
                } else {
                    mlt.setFieldNames(fields);
                    moreLikeThisQuery = mlt.like(new StringReader(text), mlt.getFieldNames()[0]);
                }
            }
            return moreLikeThisQuery;
        } catch (Exception e) {
            throw new RuntimeException("could not handle MLT query " + mltQueryString);

View Full Code Here

 */
public class MoreLikeThisHelper {


    public static Query getMoreLikeThis(IndexReader reader, Analyzer analyzer, String mltQueryString) {
        Query moreLikeThisQuery = null;
        MoreLikeThis mlt = new MoreLikeThis(reader);
        mlt.setAnalyzer(analyzer);
        try {
            String text = null;
            for (String param : mltQueryString.split("&")) {
                String[] keyValuePair = param.split("=");
                if (keyValuePair.length != 2 || keyValuePair[0] == null || keyValuePair[1] == null) {
                    throw new RuntimeException("Unparsable native Lucene MLT query: " + mltQueryString);
                } else {
                    if ("stream.body".equals(keyValuePair[0])) {
                        text = keyValuePair[1];
                    } else if ("mlt.fl".equals(keyValuePair[0])) {
                        mlt.setFieldNames(keyValuePair[1].split(","));
                    } else if ("mlt.mindf".equals(keyValuePair[0])) {
                        mlt.setMinDocFreq(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.mintf".equals(keyValuePair[0])) {
                        mlt.setMinTermFreq(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.boost".equals(keyValuePair[0])) {
                        mlt.setBoost(Boolean.parseBoolean(keyValuePair[1]));
                    } else if ("mlt.qf".equals(keyValuePair[0])) {
                        mlt.setBoostFactor(Float.parseFloat(keyValuePair[1]));
                    } else if ("mlt.maxdf".equals(keyValuePair[0])) {
                        mlt.setMaxDocFreq(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.maxdfp".equals(keyValuePair[0])) {
                        mlt.setMaxDocFreqPct(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.maxntp".equals(keyValuePair[0])) {
                        mlt.setMaxNumTokensParsed(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.maxqt".equals(keyValuePair[0])) {
                        mlt.setMaxQueryTerms(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.maxwl".equals(keyValuePair[0])) {
                        mlt.setMaxWordLen(Integer.parseInt(keyValuePair[1]));
                    } else if ("mlt.minwl".equals(keyValuePair[0])) {
                        mlt.setMinWordLen(Integer.parseInt(keyValuePair[1]));
                    }
                }
            }
            if (text != null) {
                moreLikeThisQuery = mlt.like(new StringReader(text), mlt.getFieldNames()[0]);
            }
            return moreLikeThisQuery;
        } catch (Exception e) {
            throw new RuntimeException("could not handle MLT query " + mltQueryString);
        }

View Full Code Here

TOP

Related Classes of org.apache.lucene.queries.mlt.MoreLikeThis

org.apache.jackrabbit.oak.plugins.index.lucene.util.MoreLikeThisHelper

org.apache.lucene.analysis.tokenattributes.CharTermAttribute

org.apache.lucene.analysis.TokenStream

org.apache.lucene.classification.KNearestNeighborClassifier

org.apache.lucene.document.Document

org.apache.lucene.index.Fields

org.apache.lucene.index.IndexableField

org.apache.lucene.index.Term

org.apache.lucene.index.Terms

org.apache.lucene.index.TermsEnum

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.

Examples of org.apache.lucene.queries.mlt.MoreLikeThis

More Advanced Usage

Related Classes of org.apache.lucene.queries.mlt.MoreLikeThis