Interface for an object that maintains a bidirectional mapping between docids and docnos. A docid is a globally-unique String identifier for a document in the collection. For many types of information retrieval algorithms, documents in the collection must be sequentially numbered; thus, each document in the collection must be assigned a unique integer identifier, which is its docno. Typically, the docid/docno mappings are stored in a mappings file, which is loaded into memory by concrete objects implementing this interface.
Unless there are compelling reasons otherwise, it is preferable to start numbering docnos from one instead of zero. This is because zero cannot be represented in many common compression schemes that are used in information retrieval (e.g., Golomb codes).
@author Jimmy Lin