Since the HashArrays don't handle collisions, a {@link CollisionMap} is usedto store the colliding labels.
This data structure grows by adding a new HashArray whenever the number of collisions in the {@link CollisionMap} exceeds {@code loadFactor} * {@link #getMaxOrdinal()}. Growing also includes reinserting all colliding labels into the HashArrays to possibly reduce the number of collisions. For setting the {@code loadFactor} see {@link #CompactLabelToOrdinal(int,float,int)}.
This data structure has a much lower memory footprint (~30%) compared to a Java HashMap<String, Integer>. It also only uses a small fraction of objects a HashMap would use, thus limiting the GC overhead. Ingestion speed was also ~50% faster compared to a HashMap for 3M unique labels. @lucene.experimental
|
|