Examples of org.apache.lucene.search.similarities.Similarity

Package org.apache.lucene.search.similarities

Examples of org.apache.lucene.search.similarities.Similarity

org.apache.lucene.search.similarities.Similarity
Similarity defines the components of Lucene scoring.
Expert: Scoring API.
This is a low-level API, you should only extend this API if you want to implement an information retrieval model. If you are instead looking for a convenient way to alter Lucene's scoring, consider extending a higher-level implementation such as {@link TFIDFSimilarity}, which implements the vector space model with this API, or just tweaking the default implementation: {@link DefaultSimilarity}.
Similarity determines how Lucene weights terms, and Lucene interacts with this class at both index-time and query-time.
At indexing time, the indexer calls {@link #computeNorm(FieldInvertState)}, allowing the Similarity implementation to set a per-document value for the field that will be later accessible via {@link AtomicReader#getNormValues(String)}. Lucene makes no assumption about what is in this norm, but it is most useful for encoding length normalization information.
Implementations should carefully consider how the normalization is encoded: while Lucene's classical {@link TFIDFSimilarity} encodes a combination of index-time boostand length normalization information with {@link SmallFloat} into a single byte, this might not be suitable for all purposes.
Many formulas require the use of average document length, which can be computed via a combination of {@link CollectionStatistics#sumTotalTermFreq()} and {@link CollectionStatistics#maxDoc()} or {@link CollectionStatistics#docCount()}, depending upon whether the average should reflect field sparsity.
Additional scoring factors can be stored in named NumericDocValuesFields and accessed at query-time with {@link AtomicReader#getNumericDocValues(String)}.
Finally, using index-time boosts (either via folding into the normalization byte or via DocValues), is an inefficient way to boost the scores of different fields if the boost will be the same for every document, instead the Similarity can simply take a constant boost parameter C, and {@link PerFieldSimilarityWrapper} can return different instances with different boosts depending upon field name.
At query-time, Queries interact with the Similarity via these steps:
When {@link IndexSearcher#explain(org.apache.lucene.search.Query,int)} is called, queries consult the Similarity's DocScorer for an explanation of how it computed its score. The query passes in a the document id and an explanation of how the frequency was computed. @see org.apache.lucene.index.IndexWriterConfig#setSimilarity(Similarity) @see IndexSearcher#setSimilarity(Similarity) @lucene.experimental

  }
  
  @Override
  public synchronized Similarity get(String field) {
    assert field != null;
    Similarity sim = previousMappings.get(field);
    if (sim == null) {
      sim = knownSims.get(Math.abs(perFieldSeed ^ field.hashCode()) % knownSims.size());
      previousMappings.put(field, sim);
    }
    return sim;

View Full Code Here

  public void testFloatNorms() throws IOException {


    Directory dir = newDirectory();
    IndexWriterConfig config = newIndexWriterConfig(TEST_VERSION_CURRENT,
        new MockAnalyzer(random()));
    Similarity provider = new MySimProvider();
    config.setSimilarity(provider);
    RandomIndexWriter writer = new RandomIndexWriter(random(), dir, config);
    final LineFileDocs docs = new LineFileDocs(random());
    int num = atLeast(100);
    for (int i = 0; i < num; i++) {

View Full Code Here

  }
  
  private void _showDocFields(int docid, Document doc) {
    Object table = find("docTable");
    Object srchOpts = find("srchOptTabs");
    Similarity sim = createSimilarity(srchOpts);
    if (sim == null || !(sim instanceof TFIDFSimilarity)) {
      sim = defaultSimilarity;
    }
    setString(find("docNum"), "text", String.valueOf(docid));
    removeAll(table);

View Full Code Here

    Object newNorm = find(dialog, "newNorm");
    Object encNorm = find(dialog, "encNorm");
    Object doc = find(dialog, "docNum");
    Object fld = find(dialog, "fld");
    Object srchOpts = find("srchOptTabs");
    Similarity sim = createSimilarity(srchOpts);
    TFIDFSimilarity s = null;
    if (sim != null && (sim instanceof TFIDFSimilarity)) {
      s = (TFIDFSimilarity)sim;
    } else {
      s = defaultSimilarity;

View Full Code Here

      return new SweetSpotSimilarity();
    } else if (getBoolean(ckSimOther, "selected")) {
      try {
        Class clazz = Class.forName(getString(simClass, "text"));
        if (Similarity.class.isAssignableFrom(clazz)) {
          Similarity sim = (Similarity)clazz.newInstance();
          return sim;
        } else {
          throw new Exception("Not a subclass of Similarity: " + clazz.getName());
        }
      } catch (Exception e) {

View Full Code Here

      showStatus("FAILED: Empty query.");
      return;
    }
    Object srchOpts = find("srchOptTabs");
    // query parser opts
    Similarity sim = createSimilarity(srchOpts);
    AccessibleHitCollector col;
    try {
      col = createCollector(srchOpts);
    } catch (Throwable t) {
      errorMsg("ERROR creating Collector: " + t.getMessage());

View Full Code Here

    if (q == null) return;
    Thread t = new Thread() {
      public void run() {
        try {
          IndexSearcher is = new IndexSearcher(ir);
          Similarity sim = createSimilarity(find("srchOptTabs"));
          is.setSimilarity(sim);
          Explanation expl = is.explain(q, docid.intValue());
          Object dialog = addComponent(null, "/xml/explain.xml", null, null);
          Object eTree = find(dialog, "eTree");
          addNode(eTree, expl);

View Full Code Here

      String cluster = BlurUtil.nullCheck(tableDescriptor.cluster, "tableDescriptor.cluster cannot be null.");
      assignTableUri(tableDescriptor);
      String uri = BlurUtil.nullCheck(tableDescriptor.tableUri, "tableDescriptor.tableUri cannot be null.");
      int shardCount = BlurUtil.zeroCheck(tableDescriptor.shardCount,
          "tableDescriptor.shardCount cannot be less than 1");
      Similarity similarity = BlurUtil.getInstance(tableDescriptor.similarityClass, Similarity.class);
      boolean blockCaching = tableDescriptor.blockCaching;
      Set<String> blockCachingFileTypes = tableDescriptor.blockCachingFileTypes;
      String blurTablePath = ZookeeperPathConstants.getTablePath(cluster, table);


      if (_zk.exists(blurTablePath, false) != null) {
        throw new IOException("Table [" + table + "] already exists.");
      }
      BlurUtil.setupFileSystem(uri, shardCount);
      BlurUtil.createPath(_zk, blurTablePath, null);
      BlurUtil.createPath(_zk, ZookeeperPathConstants.getTableColumnsToPreCache(cluster, table),
          toBytes(tableDescriptor.preCacheCols));
      BlurUtil.createPath(_zk, ZookeeperPathConstants.getTableUriPath(cluster, table), uri.getBytes());
      BlurUtil.createPath(_zk, ZookeeperPathConstants.getTableShardCountPath(cluster, table),
          Integer.toString(shardCount).getBytes());
      BlurUtil.createPath(_zk, ZookeeperPathConstants.getTableSimilarityPath(cluster, table), similarity.getClass()
          .getName().getBytes());
      BlurUtil.createPath(_zk, ZookeeperPathConstants.getLockPath(cluster, table), null);
      BlurUtil.createPath(_zk, ZookeeperPathConstants.getTableFieldNamesPath(cluster, table), null);
      if (tableDescriptor.readOnly) {
        BlurUtil.createPath(_zk, ZookeeperPathConstants.getTableReadOnlyPath(cluster, table), null);

View Full Code Here

  }


  @Override
  public Similarity getSimilarity(String table) {
    checkTable(table);
    Similarity similarity = _tableSimilarity.get(table);
    if (similarity == null) {
      TableDescriptor tableDescriptor = _clusterStatus.getTableDescriptor(true, _cluster, table);
      String similarityClass = tableDescriptor.similarityClass;
      if (similarityClass == null) {
        similarity = new FairSimilarity();

View Full Code Here


  public void buildIndex(Directory dir) throws IOException {
    Random random = random();
    IndexWriterConfig config = newIndexWriterConfig(TEST_VERSION_CURRENT,
        new MockAnalyzer(random()));
    Similarity provider = new MySimProvider();
    config.setSimilarity(provider);
    RandomIndexWriter writer = new RandomIndexWriter(random, dir, config);
    final LineFileDocs docs = new LineFileDocs(random, defaultCodecSupportsDocValues());
    int num = atLeast(100);
    for (int i = 0; i < num; i++) {

View Full Code Here

0 1 2 3 4 5 6 7 8

TOP

Related Classes of org.apache.lucene.search.similarities.Similarity

cc.twittertools.search.api.TrecSearchHandler

cc.twittertools.search.indexing.SearchStatuses

cc.twittertools.search.local.RunQueries

cc.twittertools.search.local.SearchStatuses

cc.twittertools.search.retrieval.QueryEnvironment

cc.twittertools.search.retrieval.TrecSearchHandler

cc.wikitools.lucene.WikipediaSearcher

org.apache.blur.manager.clusterstatus.ZookeeperClusterStatus

org.apache.blur.manager.indexserver.DistributedIndexServer

org.apache.lucene.index.memory.MemoryIndex$MemoryIndexReader

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.