com.sleepycat.persist.impl.Format
The base class for all object formats. Formats are used to define the stored layout for all persistent classes, including simple types. The design documentation below describes the storage format for entities and its relationship to information stored per format in the catalog. Requirements ------------ + Provides EntityBinding for objects and EntryBinding for keys. + Provides SecondaryKeyCreator, SecondaryMultiKeyCreator and SecondaryMultiKeyNullifier (SecondaryKeyNullifier is redundant). + Works with reflection and bytecode enhancement. + For reflection only, works with any entity model not just annotations. + Bindings are usable independently of the persist API. + Performance is almost equivalent to hand coded tuple bindings. + Small performance penalty for compatible class changes (new fields, widening). + Secondary key create/nullify do not have to deserialize the entire record; in other words, store secondary keys at the start of the data. Class Format ------------ Every distinct class format is given a unique format ID. Class IDs are not equivalent to class version numbers (as in the version property of @Entity and @Persistent) because the format can change when the version number does not. Changes that cause a unique format ID to be assigned are: + Add field. + Widen field type. + Change primitive type to primitive wrapper class. + Add or drop secondary key. + Any incompatible class change. The last item, incompatible class changes, also correspond to a class version change. For each distinct class format the following information is conceptually stored in the catalog, keyed by format ID. - Class name - Class version number - Superclass format - Kind: simple, enum, complex, array - For kind == simple: - Primitive class - For kind == enum: - Array of constant names, sorted by name. - For kind == complex: - Primary key fieldInfo, or null if no primary key is declared - Array of secondary key fieldInfo, sorted by field name - Array of other fieldInfo, sorted by field name - For kind == array: - Component class format - Number of array dimensions - Other metadata for RawType Where fieldInfo is: - Field name - Field class - Other metadata for RawField Data Layout ----------- For each entity instance the data layout is as follows: instanceData: formatId keyFields... nonKeyFields... keyFields: fieldValue... nonKeyFields: fieldValue... The formatId is the (positive non-zero) ID of a class format, defined above. This is ID of the most derived class of the instance. It is stored as a packed integer. Following the format ID, zero or more sets of secondary key field values appear, followed by zero or more sets of other class field values. The keyFields are the sets of secondary key fields for each class in order of the highest superclass first. Within a class, fields are ordered by field name. The nonKeyFields are the sets of other non-key fields for each class in order of the highest superclass first. Within a class, fields are ordered by field name. A field value is: fieldValue: primitiveValue | nullId | instanceRef | instanceData | simpleValue | enumValue | arrayValue For a primitive type, a primitive value is used as defined for tuple bindings. For float and double, sorted float and sorted double tuple values are used. For a non-primitive type with a null value, a nullId is used that has a zero (illegal formatId) value. This includes String and other simple reference types. The formatId is stored as a packed integer, meaning that it is stored as a single zero byte. For a non-primitive type, an instanceRef is used for a non-null instance that appears earlier in the data byte array. An instanceRef is the negation of the byte offset of the instanceData that appears earlier. It is stored as a packed integer. The remaining rules apply only to reference types with non-null values that do not appear earlier in the data array. For an array type, an array formatId is used that identifies the component type and the number of array dimensions. This is followed by an array length (stored as a packed integer) and zero or more fieldValue elements. For an array with N+1 dimensions where N is greater than zero, the leftmost dimension is enumerated such that each fieldValue element is itself an array of N dimensions or null. arrayValue: formatId length fieldValue... For an enum type, an enumValue is used, consisting of a formatId that identifies the enum class and an enumIndex (stored as a packed integer) that identifies the constant name in the enum constant array of the enum class format: enumValue: formatId enumIndex For a simple type, a simpleValue is used. This consists of the formatId that identifies the class followed by the simple type value. For a primitive wrapper type the simple type value is the corresponding primitive, for a Date it is the milliseconds as a long primitive, and for BigInteger or BigDecimal it is a byte array as defined for tuple bindings of these types. simpleValue: formatId value For all other complex types, an instanceData is used, which is defined above. Secondary Keys -------------- For secondary key support we must account for writing and nullifying specific keys. Rather than instantiating the entity and then performing the secondary key operation, we strive to perform the secondary key operation directly on the byte format. To create a secondary key we skip over other fields and then copy the bytes of the embedded key. This approach is very efficient because a) the entity is not instantiated, and b) the secondary keys are stored at the beginning of the byte format and can be quickly read. To nullify we currently instantiate the raw entity, set the key field to null (or remove it from the array/collection), and convert the raw entity back to bytes. Although the performance of this approach is not ideal because it requires serialization, it avoids the complexity of modifying the packed serialized format directly, adjusting references to key objects, etc. Plus, when we nullify a key we are going to write the record, so the serialization overhead may not be significant. For the record, I tried implementing nullification of the bytes directly and found it was much too complex. Lifecycle --------- Format are managed by a Catalog class. Simple formats are managed by SimpleCatalog, and are copied from the SimpleCatalog by PersistCatalog. Other formats are managed by PersistCatalog. The lifecycle of a format instance is: - Constructed by the catalog when a format is requested for a Class that currently has no associated format. - The catalog calls setId() and adds the format to its format list (indexed by format id) and map (keyed by class name). - The catalog calls collectRelatedFormats(), where a format can create additional formats that it needs, or that should also be persistent. - The catalog calls initializeIfNeeded(), which calls the initialize() method of the format class. - initialize() should initialize any transient fields in the format. initialize() can assume that all related formats are available in the catalog. It may call initializeIfNeeded() for those related formats, if it needs to interact with an initialized related format; this does not cause a cycle, because initializeIfNeeded() does nothing for an already initialized format. - The catalog creates a group of related formats at one time, and then writes its entire list of formats to the catalog DB as a single record. This grouping reduces the number of writes. - When a catalog is opened and the list of existing formats is read. After a format is deserialized, its initializeIfNeeded() method is called. setId() and collectRelatedFormats() are not called, since the ID and related formats are stored in serialized fields. - There are two modes for opening an existing catalog: raw mode and normal mode. In raw mode, the old format is used regardless of whether it matches the current class definition; in fact the class is not accessed and does not need to be present. - In normal mode, for each existing format that is initialized, a new format is also created based on the current class and metadata definition. If the two formats are equal, the new format is discarded. If they are unequal, the new format becomes the current format and the old format's evolve() method is called. evolve() is responsible for adjusting the old format for class evolution. Any number of non-current formats may exist for a given class, and are setup to evolve the single current format for the class.
@author Mark Hayes