com.sleepycat.je.tree.dupConvert.DupConvert
Performs post-recovery conversion of all dup DBs during Environment construction, when upgrading from JE 4.1 and earlier. In JE 5.0, duplicates are represented by a two-part (key + data) key, and empty data. In JE 4.1 and earlier, the key and data were separate as with non-dup DBs. Uses the DbTree.DUPS_CONVERTED_BIT to determine whether conversion of the environment is necessary. When all databases are successfully converted, this bit is set and the mapping tree is flushed. See EnvironmentImpl.convertDupDatabases. Uses DatabaseImpl.DUPS_CONVERTED to determine whether an individual database has been converted, to handle the case where the conversion crashes and is restarted later. When a database is successfully converted, this bit is set and the entire database is flushed using Database.sync. The conversion of each database is atomic -- either all INs or none are converted and made durable. This is accomplished by putting the database into Deferred Write mode so that splits won't log and eviction will be provisional (eviction will not flush the root IN if it is dirty). The Deferred Write mode is cleared after conversion is complete and Database.sync has been called. The memory budget is updated during conversion and daemon eviction is invoked periodically. This provides support for arbitrarily large DBs. Uses preload to load all dup trees (DINs/DBINs) prior to conversion, to minimize random I/O. See EnvironmentConfig.ENV_DUP_CONVERT_PRELOAD_ALL. The preload config does not specify loading of LNs, because we do not need to load LNs from DBINs. The fact that DBIN LNs are not loaded is the main reason that conversion is quick. LNs are converted lazily instead; see LNLogEntry.postFetchInit. The DBIN LNs do not need to be loaded because the DBIN slot key contains the LN 'data' that is needed to create the two-part key. Even when LN loading is not configured, it turns out that preload does load BIN (not DBIN) LNs in a dup DB, which is what we want. The singleton LNs must be loaded in order to get the LN data to create the two-part key. When preload has not loaded a singleton LN, it will be fetched during conversion. The DIN, DBIN and DupCount LSN are counted obsolete during conversion using a local utilization tracker. The tracker must not be flushed until the conversion of a database is complete. Inexact counting can be used, because DIN/DBIN/DupCountLN entries are automatically considered obsolete by the cleaner. Since only totals are tracked, the memory overhead of the local tracker is not substantial. Database Conversion Algorithm ----------------------------- 1. Set Deferred Write mode for the database. Preload the database, including INs/BINs/DINs/DBINs, but not LNs except for singleton LNs (LNs with a BIN parent). 2. Convert all IN/BIN keys to "prefix keys", which are defined by the DupKeyData class. This allows tree searches and slot insertions to work correctly as the conversion is performed. 3. Traverse through the BIN slots in forward order. 4. If a singleton LN is encountered, ensure it is loaded. IN.fetchTarget automatically updates the slot key if the LNLogEntry's key is different from the one already in the slot. Because LNLogEntry's key is converted on the fly, a two-part key is set in the slot as a side effect of fetching the LN. 5. If a DIN is encountered, first delete the BIN slot containing the DIN. Then iterate through all LNs in the DBINs of this dup tree, assign each a two-part key, and insert the slot into a BIN. The LSN and state flags of the DBIN slot are copied to the new BIN slot. 6. If a deleted singleton (BIN) LN is encountered, delete the slot rather than converting the key. If a deleted DBIN LN is encountered, simply discard it. 7. Count the DIN and DupCount LSN obsolete for each DIN encountered, using a local utilization tracker. 8. When all BIN slots have been processed, set the DatabaseImpl.DUPS_CONVERTED flag, call Database.sync to flush all INs and the MapLN, clear Deferred Write mode, and flush the local utilization tracker.