Storage Engine which saves record directly into file. It has zero protection from data corruption and must be closed properly after modifications. It is used when Write-Ahead-Log transactions are disabled. Storage format ---------------- `StoreDirect` is composed of two files: Index file is sequence of 8-byte longs, it translates `recid` (offset in index file) to record size and offset in physical file. Records position may change, but it requires stable ID, so the index file is used for translation. This store uses data structure called `Long Stack` to manage (and reuse) free space, it is is linked LIFO queue of 8-byte longs. Index file -------------- Index file is translation table between permanent record ID (recid) and mutable location in physical file. Index file is sequence of 8-byte longs, one for each record. It also has some extra longs to manage free space and other metainfo. Index table and physical data could be stored in single file, but keeping index table separate simplifies compaction. Basic **structure of index file** is bellow. Each slot is 8-bytes long so `offset=slot*8`
slot | in code | description |
0 | {@link StoreDirect#HEADER} | File header, format version and flags |
1 | {@link StoreDirect#IO_INDEX_SIZE} | Allocated file size of index file in bytes. |
2 | {@link StoreDirect#IO_PHYS_SIZE} | Allocated file size of physical file in bytes. |
3 | {@link StoreDirect#IO_FREE_SIZE} | Space occupied by free records in physical file in bytes. |
4 | {@link StoreDirect#IO_INDEX_SUM} | Checksum of all Index file headers. Checks if store was closed correctly |
5..9 | | Reserved for future use |
10..14 | | For usage by user |
15 | {@link StoreDirect#IO_FREE_RECID} | Long Stack of deleted recids, those will be reused and returned by {@link Engine#put(Object,Serializer)} |
16..4111 | | Long Stack of free physical records. This contains free space released by record update or delete. Each slots corresponds to free record size. TODO check 4111 is right |
4112 | {@link StoreDirect#IO_USER_START} | Record size and offset in physical file for recid=1 |
4113 | | Record size and offset in physical file for recid=2 |
... | ... | ... snip ... |
N+4111 | | Record size and offset in physical file for recid=N |
Long Stack ------------ Long Stack is data structure used to store free records. It is LIFO queue which uses linked records to store 8-byte longs. Long Stack is identified by slot in Index File, which stores pointer to Long Stack head. The structure of of index pointer is following:
{@code byte | description --- |--- 0..1 | relative offset in head Long Stack Record to take value from. This value decreases by 8 each take 2..7 | physical file offset of head Long Stack Record, zero if Long Stack is empty}
Each Long Stack Record is sequence of 8-byte longs, first slot is header. Long Stack Record structure is following:
{@code byte | description --- |--- 0..1 | length of current Long Stack Record in bytes 2..7 | physical file offset of next Long Stack Record, zero of this record is last 8-15 | Long Stack value 16-23 | Long Stack value ... | and so on until end of Long Stack Record}
Physical pointer ---------------- Index slot value typically contains physical pointer (information about record location and size in physical file). First 2 bytes are record size (max 65536). Then there is 6 byte offset in physical file (max store size is 281 TB). Physical file offset must always be multiple of 16, so last 4 bites are used to flag extra record information. Structure of **physical pointer**:
{@code bite | in code | description --- | --- | --- 0-15 |`val>>>48` | record size}16-59 |`val& {@link StoreDirect#MASK_OFFSET}` | physical offset 60 |`val& {@link StoreDirect#MASK_LINKED}!=0` | linked record flag 61 |`val& {@link StoreDirect#MASK_DISCARD}!=0` | to be discarded while storage is offline flag 62 |`val& {@link StoreDirect#MASK_ARCHIVE}!=0` | record modified since last backup flag 63 | | not used yet }
Records in Physical File --------------------------- Records are stored in physical file. Maximal record size size is 64KB, so larger records must be stored in form of the linked list. Each record starts by Physical Pointer from Index File. There is flag in Physical Pointer indicating if record is linked. If record is not linked you may just read ByteBuffer from given size and offset. If record is linked, each record starts with Physical Pointer to next record. So actual data payload is record size-8. The last linked record does not have the Physical Pointer header to next record, there is MASK_LINKED flag which indicates if next record is the last one.
@author Jan Kotek