A visitor collecting information about terms appearing in a {@link it.unimi.dsi.mg4j.search.DocumentIterator}.
The purpose of this visitor is that of exploring before iteration the structure of a {@link DocumentIterator} to count how many terms are actually used, and set up somepreliminary access data. More precisely, we count the distinct pairs index/term appearing in all leaves of nonzero frequency (the latter condition is used to skip empty iterators). For this visitor to work, all leaves of nonzero frequency must return a non-null
value on a call to {@link it.unimi.dsi.mg4j.index.IndexIterator#term()}.
During the visit, we keep track of which index/term pair have been already seen. Each pair is assigned an distinct offset—a number between zero and the overall number of distinct pairs—which is stored into each index iterator {@linkplain it.unimi.dsi.mg4j.index.IndexIterator#id() id}and is used afterwards to access quickly data about the pair. Note that duplicate index/term pairs get the same offset. The overall number of distinct pairs is returned by {@link #numberOfPairs()} after a visit.
During the visit, the indices actually appearing in some nonzero-frequency leaf are gathered; they are accessible as a vector returned by {@link #indices()}, and the map from positions in this vector to indices is inverted by {@link #indexMap()}. If you need to force some index to appear in {@link #indices()}, there's a special {@link #prepare(ReferenceSet)} method.
The offset assigned to each pair index/term is returned by {@link #offset(Index,String)}. Should you need to know the terms associated to each index, they are returned by {@link #terms(Index)}.
The after a term collection, usually counters are set up by a visit of {@link it.unimi.dsi.mg4j.search.visitor.CounterSetupVisitor}.