com.sleepycat.je.incomp.INCompressor
JE compression consists of removing deleted entries from BINs, and pruning empty IN/BINs from the tree which is also called a reverse split. One of the reasons compression is treated specially is that slot compression cannot be performed inline as part of a delete operation. When we delete an LN, a cursor is always present on the LN. The API dictates that the cursor will remain positioned on the deleted record. In addition, if the deleting transaction aborts we must restore the slot and the possibility of a split during an abort is something we wish to avoid; for this reason, compression will not occur if the slot's LSN is locked. In principle, slot compression could be performed during transaction commit, but that would be expensive because a Btree lookup would be required, and this would negatively impact operation latency. For all these reasons, slot compression is performed after the delete operation is complete and committed, and not in the thread performing the operation or transaction commit. Compression is of two types: + "Queued compression" is carried out by the INCompressor daemon thread. Both slot compression and pruning are performed. + "Lazy compression" is carried out as part of logging a BIN by certain operations (namely checkpointing and eviction). Only slot compression is performed by lazy compression, not pruning. The use of BINDeltas has a big impact on slot compression because slots cannot be compressed until we know that a full BIN will next be logged. If a slot were compressed prior to logging a BINDelta, the record of the compression would be lost and the slot would "reappear" when the BIN is reconstituted; therefore, this is not permitted. Queued compression prior to logging a BINDelta is also wasteful because the dequeued entry cannot be processed. Therefore, lazy compression is used when a BINDelta will next be logged, and queued compression is used only when a full BIN will next be logged. Because, in general, BINDeltas are logged more often than BINs, lazy compression is used for slot compression more often than queued compression. You may wonder, since lazy compression is used most of the time for slot compression, why use queued compression for slot compression at all? Queued compression is useful for slot compression for the following reasons: + When a cursor is on a BIN, queuing has an advantage over lazy compression. If we can't compress during logging because of a cursor, we have to log anyway, and we must delay compression and log again later. If we can't compress when processing a queue entry, we requeue it and retry later, which increases the chances that we will be able to compress before logging. + The code to process a queue entry must do slot compression anyway, even if we only want to prune the BIN. We have to account for the case where all slots are deleted but not yet compressed. So the code to process the queue entry could not be simplified even if we were to decide not to queue entries for slot compression. + Because BINDeltas are not used for DeferredWrite mode, queued compression is much more appropriate and efficient in this mode. The mainstream algorithm for compression is as follows. 1. When a delete operation occurs (CursorImpl.delete) we call Locker.addDeleteInfo, which determines whether a BINDelta will next be logged (BIN.shouldLogDelta). If so, it does nothing, meaning that lazy compression will be used. If not (a full BIN will next be logged), it adds a BINReference to the Locker.deleteInfo map. 2. When the operation is successful and the Locker releases its locks, it copies the BINReferences from the deleteInfo map to the compressor queue. For a transaction this happens at commit (Txn.commit). For a non-transaction locker this happens when the cursor moves or is closed (BasicLocker.releaseNonTxnLocks). 3. The INCompressor thread processes its queue entries periodically, based on EnvironmentConfig.COMPRESSOR_WAKEUP_INTERVAL. For each BINReference that was queued, it tries to compress all deleted slots (BIN.compress). If the BIN is then empty, it prunes the Btree (Tree.delete). If a slot cannot be compressed or an empty BIN cannot be pruned because a slot's LSN is locked or a cursor is present, the entry is requeued for retry. 4. Lazy compression occurs via the checkpointer and evictor. When logging an IN, these components pass true for the allowCompress parameter. If a full BIN is logged (BIN.shouldLogDelta returns false), the lazyCompress method is called by BIN.beforeLog. lazyCompress will attempt to compress all deleted slots (BIN.compress). If the BIN is then empty, it will queue a BINReference so that pruning will occur later. If a slot cannot be compressed (because the LSN is locked or a cursor is present), the BIN.afterLog method will queue a BINReference. In this last case, two full BINs will be logged consecutively. Special cases are as follows. A. Before performing a split, we call lazyCompress in order to avoid the split if possible (Tree.searchSubTreeUntilSplit). It is important to avoid splitting when compression is deferred due to BINDeltas. B. When we undo an LN insertion (via abort, rollback or recovery undo in RecoveryManager.undo), or redo an LN deletion during recovery (RecoveryManager.redo), we queue a BINReference if a full BIN will next be logged (BIN.queueSlotDeletion). This mimics what happens during a mainstream delete operation.