Examples of org.apache.lucene.facet.taxonomy.writercache.cl2o.CompactLabelToOrdinal

org.apache.lucene.facet.taxonomy.writercache.cl2o.CompactLabelToOrdinal

This is a very efficient LabelToOrdinal implementation that uses a CharBlockArray to store all labels and a configurable number of HashArrays to reference the labels.

Since the HashArrays don't handle collisions, a {@link CollisionMap} is usedto store the colliding labels.

This data structure grows by adding a new HashArray whenever the number of collisions in the {@link CollisionMap} exceeds {@code loadFactor} * {@link #getMaxOrdinal()}. Growing also includes reinserting all colliding labels into the HashArrays to possibly reduce the number of collisions. For setting the {@code loadFactor} see {@link #CompactLabelToOrdinal(int,float,int)}.

This data structure has a much lower memory footprint (~30%) compared to a Java HashMap<String, Integer>. It also only uses a small fraction of objects a HashMap would use, thus limiting the GC overhead. Ingestion speed was also ~50% faster compared to a HashMap for 3M unique labels. @lucene.experimental


  @Test
  public void testL2O() throws Exception {
    LabelToOrdinal map = new LabelToOrdinalMap();


    CompactLabelToOrdinal compact = new CompactLabelToOrdinal(2000000, 0.15f, 3);


    final int n = atLeast(10 * 1000);
    final int numUniqueValues = 50 * 1000;


    String[] uniqueValues = new String[numUniqueValues];
    byte[] buffer = new byte[50];


    for (int i = 0; i < numUniqueValues;) {
      random.nextBytes(buffer);
      int size = 1 + random.nextInt(50);


      uniqueValues[i] = new String(buffer, 0, size);
      if (uniqueValues[i].indexOf(CompactLabelToOrdinal.TerminatorChar) == -1) {
        i++;
      }
    }


    TEMP_DIR.mkdirs();
    File f = new File(TEMP_DIR, "CompactLabelToOrdinalTest.tmp");
    int flushInterval = 10;


    for (int i = 0; i < n * 10; i++) {
      if (i > 0 && i % flushInterval == 0) {
        compact.flush(f);    
        compact = CompactLabelToOrdinal.open(f, 0.15f, 3);
        assertTrue(f.delete());
        if (flushInterval < (n / 10)) {
          flushInterval *= 10;
        }
      }


      int index = random.nextInt(numUniqueValues);
      CategoryPath label = new CategoryPath(uniqueValues[index], '/');


      int ord1 = map.getOrdinal(label);
      int ord2 = compact.getOrdinal(label);


      //System.err.println(ord1+" "+ord2);


      assertEquals(ord1, ord2);


      if (ord1 == LabelToOrdinal.InvalidOrdinal) {
        ord1 = compact.getNextOrdinal();


        map.addLabel(label, ord1);
        compact.addLabel(label, ord1);
      }
    }


    for (int i = 0; i < numUniqueValues; i++) {
      CategoryPath label = new CategoryPath(uniqueValues[i], '/');
      int ord1 = map.getOrdinal(label);
      int ord2 = compact.getOrdinal(label);
      assertEquals(ord1, ord2);
    }
  }

Examples of org.apache.lucene.facet.taxonomy.writercache.cl2o.CompactLabelToOrdinal

Related Classes of org.apache.lucene.facet.taxonomy.writercache.cl2o.CompactLabelToOrdinal