Examples of org.apache.hadoop.hive.ql.io.AcidInputFormat

org.apache.hadoop.hive.ql.io.AcidInputFormat
The interface required for input formats that what to support ACID transactions.
The goal is to provide ACID transactions to Hive. There are several primary use cases:
- Streaming ingest- Allow Flume or Storm to stream data into Hive tables with relatively low latency (~30 seconds).
- Dimension table update- Allow updates of dimension tables without overwriting the entire partition (or table) using standard SQL syntax.
- Fact table inserts- Insert data into fact tables at granularity other than entire partitions using standard SQL syntax.
- Fact table update- Update large fact tables to correct data that was previously loaded.
It is important to support batch updates and maintain read consistency within a query. A non-goal is to support many simultaneous updates or to replace online transactions systems.
The design changes the layout of data within a partition from being in files at the top level to having base and delta directories. Each write operation will be assigned a sequential global transaction id and each read operation will request the list of valid transaction ids.
- Old format -
```
 $partition/$bucket 
```
- New format -
```
 $partition/base_$tid/$bucket delta_$tid_$tid/$bucket 
```
With each new write operation a new delta directory is created with events that correspond to inserted, updated, or deleted rows. Each of the files is stored sorted by the original transaction id (ascending), bucket (ascending), row id (ascending), and current transaction id (descending). Thus the files can be merged by advancing through the files in parallel.
The base files include all transactions from the beginning of time (transaction id 0) to the transaction in the directory name. Delta directories include transactions (inclusive) between the two transaction ids.
Because read operations get the list of valid transactions when they start, all reads are performed on that snapshot, regardless of any transactions that are committed afterwards.
The base and the delta directories have the transaction ids so that major (merge all deltas into the base) and minor (merge several deltas together) compactions can happen while readers continue their processing.
To support transitions between non-ACID layouts to ACID layouts, the input formats are expected to support both layouts and detect the correct one. @param < V> The row type

                    Reporter reporter) throws IOException {
      // This will only get called once, since CompactRecordReader only returns one record,
      // the input split.
      // Based on the split we're passed we go instantiate the real reader and then iterate on it
      // until it finishes.
      AcidInputFormat aif =
          instantiate(AcidInputFormat.class, jobConf.get(INPUT_FORMAT_CLASS_NAME));
      ValidTxnList txnList =
          new ValidTxnListImpl(jobConf.get(ValidTxnList.VALID_TXNS_KEY));


      AcidInputFormat.RawReader<V> reader =
          aif.getRawReader(jobConf, jobConf.getBoolean(IS_MAJOR, false), split.getBucket(),
              txnList, split.getBaseDir(), split.getDeltaDirs());
      RecordIdentifier identifier = reader.createKey();
      V value = reader.createValue();
      getWriter(reporter, reader.getObjectInspector(), split.getBucket());
      while (reader.next(identifier, value)) {

View Full Code Here

                    Reporter reporter) throws IOException {
      // This will only get called once, since CompactRecordReader only returns one record,
      // the input split.
      // Based on the split we're passed we go instantiate the real reader and then iterate on it
      // until it finishes.
      AcidInputFormat aif =
          instantiate(AcidInputFormat.class, jobConf.get(INPUT_FORMAT_CLASS_NAME));
      ValidTxnList txnList =
          new ValidTxnListImpl(jobConf.get(ValidTxnList.VALID_TXNS_KEY));


      AcidInputFormat.RawReader<V> reader =
          aif.getRawReader(jobConf, jobConf.getBoolean(IS_MAJOR, false), split.getBucket(),
              txnList, split.getBaseDir(), split.getDeltaDirs());
      RecordIdentifier identifier = reader.createKey();
      V value = reader.createValue();
      getWriter(reporter, reader.getObjectInspector(), split.getBucket());
      while (reader.next(identifier, value)) {

View Full Code Here

                    Reporter reporter) throws IOException {
      // This will only get called once, since CompactRecordReader only returns one record,
      // the input split.
      // Based on the split we're passed we go instantiate the real reader and then iterate on it
      // until it finishes.
      AcidInputFormat aif =
          instantiate(AcidInputFormat.class, jobConf.get(INPUT_FORMAT_CLASS_NAME));
      ValidTxnList txnList =
          new ValidTxnListImpl(jobConf.get(ValidTxnList.VALID_TXNS_KEY));


      AcidInputFormat.RawReader<V> reader =
          aif.getRawReader(jobConf, jobConf.getBoolean(IS_MAJOR, false), split.getBucket(),
              txnList, split.getBaseDir(), split.getDeltaDirs());
      RecordIdentifier identifier = reader.createKey();
      V value = reader.createValue();
      getWriter(reporter, reader.getObjectInspector(), split.getBucket());
      while (reader.next(identifier, value)) {

View Full Code Here

                    Reporter reporter) throws IOException {
      // This will only get called once, since CompactRecordReader only returns one record,
      // the input split.
      // Based on the split we're passed we go instantiate the real reader and then iterate on it
      // until it finishes.
      AcidInputFormat aif =
          instantiate(AcidInputFormat.class, jobConf.get(INPUT_FORMAT_CLASS_NAME));
      ValidTxnList txnList =
          new ValidTxnListImpl(jobConf.get(ValidTxnList.VALID_TXNS_KEY));


      AcidInputFormat.RawReader<V> reader =
          aif.getRawReader(jobConf, jobConf.getBoolean(IS_MAJOR, false), split.getBucket(),
              txnList, split.getBaseDir(), split.getDeltaDirs());
      RecordIdentifier identifier = reader.createKey();
      V value = reader.createValue();
      getWriter(reporter, reader.getObjectInspector(), split.getBucket());
      while (reader.next(identifier, value)) {

View Full Code Here

TOP

Related Classes of org.apache.hadoop.hive.ql.io.AcidInputFormat

org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.