The default string-encoding algorithm does not support localized collation. TODO - add collation information here - TODO
Persistit offers built-in support for encoding and decoding of a few commonly used Object types. These include:
java.lang.String java.math.BigInteger java.math.BigDecimal java.util.Date byte[](Note that for byte array, a zero-valued bytes are converted to two-byte sequences, as described above for strings.)
The default encoding for these types, plus any additional Object types that are required to be used as key values, can be overridden by a custom {@link com.persistit.encoding.KeyCoder}. For consistency, the application should register all custom KeyCoder
objects immediately after initializing Persistit, for example:
All overridden object types sort after all other value types. Ordering among various custom types is determined by the custom encoding algorithm's implementation. See {@link com.persistit.encoding.CoderManager} for details. Persistit.initialize(); KeyCoder coder = new MyKeyCoder(); Persistit.getInstance().getCoderManager() .registerKeyCoder(MyClass.class, coder);
An application may append multiple values to a Key
, each of which is called a key segment. Applications use multiple segments to form concatenated keys. A concatenated key uniquely identifies a particular record by a combination of data values rather than one simple value. The number of segments in a Persistit concatenated key is bounded only by the architectural limitation on the length of the underlying byte array.
Key
encodes strings, byte arrays, and all other data types that might naturally contain a zero- valued byte by inserting escape sequences in place of the zero values. Specifically, NUL (character code 0) in a string, or a zero in a byte array element is replaced by the two-byte sequence (0x01, 0x20). An SOH (character code 1) or a one in a byte array element is replaced by the two-byte sequence (0x01, 0x021). This scheme is handled automatically, and only those applications that manipulate the raw byte buffer using the low-level API need to be aware of it. This encoding ensures that the key orderering preserves the natural ordering of the underlying values. For example, the encoded forms of the two string "AB" and "ABC" are (0x80, 0x41, 0x42, 0x00) and (0x80, 0x41, 0x42, 0x43, 0x00), respectively. The two encodings differ in the third byte (the zero that terminates the shorter string) which correctly causes the shorter string to collate before the longer one. Segments fall naturally into the ordering scheme. If two keys are different, then the first segment that differs between the two keys controls their ordering. For example, the code fragment
sets the four keys so that their ordering sequence is key1.clear().append(1).append(1); key2.clear().append(1).append(2); key3.clear().append(2).append(1); key4.clear().append(2).append(1).append(0);
key1 < key2 < key3 < key4.
The ability to append multiple key segments to a Key
supports a method of grouping records logically by hierarchical key values. Much as a paper filing system uses cabinets, drawers, and folders to organize documents, segmented keys can be used to impose a hierarchical organization on data. For example, the key for a purchase order record might have a single segment representing the purchase order number. The purchase order's subsidiary line items might then be stored with keys that are logical children of the purchase order number, as suggested in this code snippet:
This example would store the PurchaseOrderSummary under a key containing just the purchase order number, and then store each of the line items from the poLineItems List in a key containing the purchase order number and the line item number as separate segments. (The {@link #to(Object)} method is aconvenience method that replaces the final segment of the key with a new value.) PurchaseOrderSummary poSummary = ... List poLineItems = ... ... exchange.getValue().put(poSummary) exchange.clear().append(poSummary.getPurchaseOrderNumber()).store(); .. exchange.append(Key.BEFORE); for (Iterator items = poLineItems.iterator(); items.hasMore();) { LineItem item = (LineItem)lineItems.next(); exchange.getValue().set(item); exchange.clear().to(item.getLineItemId()).store(); } ...
Logical child relationships between keys are represented solely by the way in which keys are encoded and ordered within the physical tree; there is no direct physical representation of the logical hierarchy. However, because of the way keys are physically ordered within a Tree
, logical child keys fall closer to their parents in key sort order than other keys, and are therefore more likely to be located on physical database pages that have already been read into the buffer pool.
Two families of methods of methods in {@link Exchange} incorporate logic tohandle logical child keys in a special way:
Tree
, regardless of whether it is a logical child key. However, if shallow traversal is requested, all logical children are skipped and the result is the next (or previous) logical sibling key value.remove
method can remove just the line items, just the purchase order summary, or both.At times it is convenient to represent a Key value as a String, for example, to display it or enter it for editing. This class provides an implementation of {@link #toString} that creates a canonical String representation of a key.The {@link KeyParser#parseKey} method provides the inverse functionality,parsing a canonical string representation to create a key value.
The String representation is of the form:
{ segment,... }where each segment value is one of the following:
null
false
true
Key key = new Key(); key.append("xyz").append(1.23).append((long)456).append(new Date()); System.out.println(key.toString());
would produce a string representation such as { "xyz", 1.23, (long) 456, (java.util.Date) 20040901114722.563 + 0500 }All numeric types other than
double
and int
use a cast segment representation so to permit exact translation to and from the String representation and the underlying internal key value. The canonical representation of a Date is designed to allow exact translation to and from the internal segment value while being somewhat legible. Applications do two fundamental things with Key
s:
Methods used to construct key values are {@link #clear}, {@link #setDepth}, {@link #cut}, {@link #append(boolean)}, {@link #append(byte)}, {@link #append(short)} ... {@link #append(Object)}, {@link #to(boolean)}, {@link #to(byte)} {@link #to(short)} ... {@link #to(Object)}. These methods all modify the current state of the key. As a convenience, these methods all return the Key
to support method call chaining.
Methods used to decode key values are {@link #reset}, {@link #indexTo}, {@link #decodeBoolean}, {@link #decodeByte}, {@link #decodeShort} ...{@link #decode}. These methods do not modify the value represented by the key. Each decodeTTTT
method returns a value of type TTTT. The {@link #decode} method returns a value of typeObject
. The reset
and indexTo
control which segment the next value will be decoded from.
The low-level API allows an application to bypass the encoding and decoding operations described above and instead to operate directly on the byte array used as the physical B-Tree key. This might be appropriate for an existing application that has already implemented its own serialization mechanisms, for example, or to accommodate special key manipulation requirements. Applications should use these methods only if there is a compelling requirement.
The low-level API methods are:
byte[] {@link #getEncodedBytes}int {@link #getEncodedSize}void {@link #setEncodedSize(int)}@version 1.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|