A global index is partitioned documentally by providing a {@link DocumentalPartitioningStrategy}that specifies a destination local index for each document, and a local document pointer. The global index is scanned, and the postings are partitioned among the local indices using the provided strategy. For instance, a {@link ContiguousDocumentalStrategy} divides an index into blocks of contiguous documents.
Since each local index contains a (proper) subset of the original set of documents, it contains in general a (proper) subset of the terms in the global index. Thus, the local term numbers and the global term numbers will not in general coincide. As a result, when a set of local indices is accessed transparently as a single index using a {@link it.unimi.dsi.mg4j.index.cluster.DocumentalCluster}, a call to {@link it.unimi.dsi.mg4j.index.Index#documents(int)} will throw an {@link java.lang.UnsupportedOperationException}, because there is no way to map the global term numbers to local term numbers.
On the other hand, a call to {@link it.unimi.dsi.mg4j.index.Index#documents(CharSequence)} will be passed each local index tobuild a global iterator. To speed up this phase for not-so-frequent terms, when partitioning an index you can require the construction of {@linkplain BloomFilter Bloom filters} that will be used to try to avoidinquiring indices that do not contain a term. The precision of the filters is settable.
The property file will use a {@link it.unimi.dsi.mg4j.index.cluster.DocumentalMergedCluster} unless you providea {@link ContiguousDocumentalStrategy}, in which case a {@link it.unimi.dsi.mg4j.index.cluster.DocumentalConcatenatedCluster} will be used instead. Note that there mightbe other cases in which the latter is adapt, in which case you can edit manually the property file. Important: this class just partitions the index. No auxiliary files (most notably, {@linkplain StringMap term maps} or {@linkplain PrefixMap prefix maps}) will be generated. Please refer to a {@link StringMap} implementation (e.g.,{@link ShiftAddXorSignedStringMap} or {@link ImmutableExternalPrefixMap}). Warning: variable quanta are not supported by this class, as it is impossible to predict accurately the number of bits used for positions when partitioning documentally. If you want to use variable quanta, use a simple interleaved indices without skips as an intermediate step, and pass them through {@link Combine}.
|
|