Layout of a Kiji table.
Kiji uses the term layout to describe the structure of a table. Kiji does not use the term schema to avoid confusion with Avro schemas or XML schemas.
KijiTableLayout wraps a layout descriptor represented as a {@link org.kiji.schema.avro.TableLayoutDesc TableLayoutDesc} Avro record.KijiTableLayout provides strict validation and accessors to navigate through the layout.
KijiTableLayouts can be created via one of two methods: from a concrete layout with {@link #newLayout(TableLayoutDesc)}, or as a layout update from a preexisting KijiTableLayout, with {@link #createUpdatedLayout(TableLayoutDesc,KijiTableLayout)}. For the format requirements of layout descriptors for these methods, see the "Layout descriptors" section below.
Overall structure
At the top-level, a table contains:
- the table name and description;
- how row keys are encoded;
- the table locality groups.
Each locality group has:
- a primary name, unique within the table, a description and some name aliases;
- whether the data is to be stored in memory or on disk;
- data retention lifetime;
- maximum number of versions to keep;
- type of compression;
- column families stored in this locality group
Each column family has:
- a primary name, globally unique within the table, a description and some name aliases;
- for map-type families, the Avro schema of the cell values;
- for group-type families, the collection of columns in the group.
Each column in a group-type family has:
- a primary name, unique within the family, a description and some name aliases;
- an Avro schema.
Layout descriptors
Layout descriptors are represented using {@link org.kiji.schema.avro.TableLayoutDesc TableLayoutDesc} Avro records.Layout descriptors come in two flavors:
concrete layouts and
layout updates.
Concrete layout descriptors
A concrete layout descriptors is an absolute, standalone description of a table layout, which does not reference or build upon any previous version of the table layout. Column IDs have been assigned to all locality groups, families and columns.
Names of tables, locality groups, families and column qualifiers must be valid identifiers. Name validation occurs in {@link org.kiji.schema.util.KijiNameValidator KijiNameValidator}.
Validation rules
- Table names, locality group names, family names, and column names in a group-type family must be valid identifiers (no punctuation or symbols). Note: map-type family qualifiers are free-form, but do never appear in a table layout.
- Locality group names and aliases must be unique within the table.
- Family names and aliases must be unique within the table.
- Group-type family qualifiers must be unique within the family.
Layout update descriptors
A table layout update descriptor builds on a reference table layout, and describes layout modification to apply on the reference layout. The reference table layout is specified by writing the ID of the reference layout ( {@link TableLayoutDesc#layout_id}) into the {@link TableLayoutDesc#reference_layout}. This mechanism prevents race conditions when updating the layout of a table. The first layout of a newly created table has no reference layout.
During a layout update, the user may delete or declare new locality groups, families and/or columns, or modify existing entities, by specifying the new layout. Update validation rules are enforced to ensure compatibility (see Validation rules for updates below).
Entities may also be renamed, as long as uniqueness requirements are met. Primary name updates must be explicitly annotated by setting the {@code renamedFrom} field ofthe entity being renamed. The name of a table cannot be changed.
For example, suppose the reference layout contained one family {@code Info}, containing a column {@code Name}, and the user wishes to add a new {@code Address} column to the{@code Info} family.To perform this update, the user would create a layout update by starting with the existing layout, setting the {@code reference_layout} field to the {@code layout_id} of thecurrent layout, and adding a new {@link ColumnDesc} record describing the {@code Address}column to the the {@code columns} field of the {@link FamilyDesc} for the {@code Info} family.
The result of applying a layout update on top of a concrete reference layout is a new concrete layout.
Validation rules for updates
Updates are subject to the same restrictions as concrete layout descriptors. In addition:
- The type of a family (map-type or group-type) cannot be changed.
- A family cannot be moved into a different locality group.
- The encoding of Kiji cells (hash, UID, final) cannot be modified.
- The schema of a Kiji cell can only be changed to a schema that is compatible with all the former schemas of the column. Schema compatibility requires that the new schema allows decoding all former schemas associated to the column or the map-type family.
Row keys encoding
A row in a Kiji table is identified by its Kiji row key. Kiji row keys are converted into HBase row keys according to the row key encoding specified in the table layout:
- Raw encoding: the user has direct control over the encoding of row keys in the HBase table. In other words, the HBase row key is exactly the Kiji row key. These are used when the user would like to use arrays of bytes as row keys.
- Hashed: Deprecated! The HBase row key is computed as a hash of a single String or byte array component.
- Hash-prefixed: the HBase row key is computed as the concatenation of the hash of a single String or byte array component.
- Formatted: the row key is comprised of one or more components. Each component can be a string, a number or a hash of another component. The user will specify the size of this hash. The user also specifies the actual order of the components in the key.
Hashing allows to spread the rows evenly across all the regions in the table. Specifying the size of the hash gives the user fine grained control of how the data will be distributed.
Cell schema
Kiji cells are encoded according to a schema specified via {@link org.kiji.schema.avro.CellSchema CellSchema} Avro records.Kiji provides various cell encoding schemes:
- Hash: each Kiji cell is encoded as a hash of the Avro schema, followed by the binary encoding of the Avro value.
- UID: each Kiji cell is encoded as the unique ID of the Avro schema, followed by the binary encoding of the Avro value.
- Final: each Kiji cell is encoded as the binary encoding of the Avro value.
See {@link org.kiji.schema.impl.AvroCellEncoder KijiCellEncoder}and {@link org.kiji.schema.impl.AvroCellDecoder KijiCellDecoder}for more implementation details.
Column IDs
Kiji allows the column names to be represented on HBase in multiple modes via {@link org.kiji.schema.avro.ColumnNameTranslator ColumnNameTranslator} Avro enumeration.By default we use the shortened Kiji column name translation due to space efficiency. Depending on compatability requirements with other HBase tools it may be desirable to use the IDENTITY or HBASE_NATIVE column name translators.
SHORT Kiji column name translation:
For storage efficiency purposes, Kiji family and column names are translated into short HBase column names by default. This translation happens in {@link org.kiji.schema.layout.impl.hbase.ShortColumnNameTranslator ShortColumnNameTranslator}and relies on {@link org.kiji.schema.layout.impl.ColumnId ColumnId}. Column IDs are assigned automatically by KijiTableLayout. The user may specify column IDs manually. KijiTableLayout checks the consistency of column IDs.
Column IDs cannot be changed (a column ID change is equivalent to deleting the existing column and then re-creating it as a new empty column).
IDENTITY Kiji column name translation:
For compatibility with other HBase tools, Kiji family and column names can be written to HBase directly. This translation happens in {@link org.kiji.schema.layout.impl.hbase.IdentityColumnNameTranslator}In this mode:
- Kiji locality groups are translated into HBase families.
- Kiji column families and qualifiers are combined to form the HBase qualifier("family:qualifier").
HBASE_NATIVE Kiji column name translation:
For compatibility with existing HBase tables, the notion of a Kiji locality group can be ignored, mapping Kiji family and column names directly to their HBase equivalents. This translation happens in {@link org.kiji.schema.layout.impl.hbase.HBaseNativeColumnNameTranslator}In this mode:
- Kiji locality groups and column families are translated into HBase families.
- Additionally, Kiji locality groups must match the Kiji column families. This has the side effect of requiring a one to one mapping between the Kiji locality groups and column families.
- Kiji column qualifiers are combined to form the HBase qualifier.