the tuple object for reuse. // indices of various fields in the input Tuple. int idxName, idxSalary, idxBonusPct; @Override public void configure(JobConf job) { Schema projection = TableInputFormat.getProjection(job); // determine the field indices. idxName = projection.getColumnIndex("Name"); idxSalary = projection.getColumnIndex("Salary"); idxBonusPct = projection.getColumnIndex("BonusPct"); } @Override public void map(BytesWritable key, Tuple value, OutputCollector<K, V> output, Reporter reporter) throws IOException { try { String name = (String) value.get(idxName); int salary = (Integer) value.get(idxSalary); double bonusPct = (Double) value.get(idxBonusPct); // do something with the input data } catch (ExecException e) { e.printStackTrace(); } } @Override public void close() throws IOException { // no-op } } A little bit more explanation on the PIG {@link Tuple} objects. A Tuple is anordered list of PIG datum objects. The permitted PIG datum types can be categorized as Scalar types and Composite types.
Supported Scalar types include seven native Java types: Boolean, Byte, Integer, Long, Float, Double, String, as well as one PIG class called {@link DataByteArray} that represents type-less byte array.
Supported Composite types include:
- {@link Map} : It is the same as Java Map class, with the additionalrestriction that the key-type must be one of the scalar types PIG recognizes, and the value-type any of the scaler or composite types PIG understands.
- {@link DataBag} : A DataBag is a collection of Tuples.
- {@link Tuple} : Yes, Tuple itself can be a datum in another Tuple.