Examples of cc.mallet.pipe.Pipe

cc.mallet.pipe.Pipe
The abstract superclass of all Pipes, which transform one data type to another. Pipes are most often used for feature extraction.
Although Pipe does not have any "abstract methods", in order to use a Pipe subclass you must override either the {@link pipe} method or the {@link newIteratorFrom} method.The former is appropriate when the pipe's processing of an Instance is strictly one-to-one. For every Instance coming in, there is exactly one Instance coming out. The later is appropriate when the pipe's processing may result in more or fewer Instances than arrive through its source iterator.
A pipe operates on an {@link cc.mallet.types.Instance}, which is a carrier of data. A pipe reads from and writes to fields in the Instance when it is requested to process the instance. It is up to the pipe which fields in the Instance it reads from and writes to, but usually a pipe will read its input from and write its output to the "data" field of an instance.
A pipe doesn't have any direct notion of input or output - it merely modifies instances that are handed to it. A set of helper classes, which implement the interface {@link Iterator}, iterate over commonly encountered input data structures and feed the elements of these data structures to a pipe as instances.
A pipe is frequently used in conjunction with an {@link cc.mallet.types.InstanceList} As instances are addedto the list, they are processed by the pipe associated with the instance list and the processed Instance is kept in the list.
In one common usage, a {@link cc.mallet.pipe.iterator.FileIterator} is given a list of directories to operate over.The FileIterator walks through each directory, creating an instance for each file and putting the data from the file in the data field of the instance. The directory of the file is stored in the target field of the instance. The FileIterator feeds instances to an InstanceList, which processes the instances through its associated pipe and keeps the results.
Pipes can be hierachically composed. In a typical usage, a SerialPipe is created, which holds other pipes in an ordered list. Piping an instance through a SerialPipe means piping the instance through each of the child pipes in sequence.
A pipe holds two separate Alphabets: one for the symbols (feature names) encountered in the data fields of the instances processed through the pipe, and one for the symbols (e.g. class labels) encountered in the target fields.
@author Andrew McCallum mccallum@cs.umass.edu

    super (name);
  }


  public void testFromSerialization () throws IOException, ClassNotFoundException
  {
    Pipe p = new GenericAcrfData2TokenSequence ();
    InstanceList training = new InstanceList (p);
    training.addThruPipe (new LineGroupIterator (new StringReader (sampleData), Pattern.compile ("^$"), true));


    Pipe p2 = (Pipe) TestSerializable.cloneViaSerialization (p);


    InstanceList l1 = new InstanceList (p);
    l1.addThruPipe (new LineGroupIterator (new StringReader (sampleData2), Pattern.compile ("^$"), true));
    InstanceList l2 = new InstanceList (p2);
    l2.addThruPipe (new LineGroupIterator (new StringReader (sampleData2), Pattern.compile ("^$"), true));


    // the readResolve alphabet thing doesn't kick in on first deserialization
    assertTrue (p.getTargetAlphabet () != p2.getTargetAlphabet ());


    assertEquals (1, l1.size ());
    assertEquals (1, l2.size ());


    Instance inst1 = l1.get (0);

View Full Code Here

    }
  }


  public void testFixedNumLabels () throws IOException, ClassNotFoundException
  {
    Pipe p = new GenericAcrfData2TokenSequence (2);
    InstanceList training = new InstanceList (p);
    training.addThruPipe (new LineGroupIterator (new StringReader (sampleFixedData), Pattern.compile ("^$"), true));


    assertEquals (1, training.size ());

View Full Code Here

  public void testTrain() {
    doTestSpacePrediction(false);
  }


  public void doTestSpacePrediction(boolean testValueAndGradient) {
    Pipe p = makeSpacePredictionPipe();
    Pipe p2 = new TestCRF2String();


    InstanceList instances = new InstanceList(p);
    instances.addThruPipe(new ArrayIterator(data));
    InstanceList[] lists = instances.split(new Random(1), new double[] {
        .5, .5 });

View Full Code Here

    }
  }


  public void doTestSpacePrediction(boolean testValueAndGradient,
      boolean useSaved, boolean useSparseWeights) {
    Pipe p = makeSpacePredictionPipe();


    CRF savedCRF;
    File f = new File("TestObject.obj");
    InstanceList instances = new InstanceList(p);
    instances.addThruPipe(new ArrayIterator(data));
    InstanceList[] lists = instances.split(new double[] { .5, .5 });
    CRF crf = new CRF(p.getDataAlphabet(), p.getTargetAlphabet());
    crf.addFullyConnectedStatesForLabels();
    CRFTrainerByLabelLikelihood crft = new CRFTrainerByLabelLikelihood(crf);
    crft.setUseSparseWeights(useSparseWeights);
    if (testValueAndGradient) {
      Optimizable.ByGradientValue minable = crft

View Full Code Here

      }
    }
  }


  private Pipe makeSpacePredictionPipe() {
    Pipe p = new SerialPipes(new Pipe[] {
        new CharSequence2TokenSequence("."),
        new TokenSequenceLowercase(),
        new TestCRFTokenSequenceRemoveSpaces(),
        new TokenText(),
        new OffsetConjunctions(true, new int[][] { { 0 }, { 1 },

View Full Code Here

        new TokenSequence2FeatureVectorSequence() });
    return p;
  }


  public void testAddOrderNStates() {
    Pipe p = makeSpacePredictionPipe();


    InstanceList instances = new InstanceList(p);
    instances.addThruPipe(new ArrayIterator(data));
    InstanceList[] lists = instances.split(new java.util.Random(678),
        new double[] { .5, .5 });


    // Compare 3 CRFs trained with addOrderNStates, and make sure
    // that having more features leads to a higher likelihood


    CRF crf1 = new CRF(p.getDataAlphabet(), p.getTargetAlphabet());
    crf1.addOrderNStates(lists[0], new int[] { 1, },
        new boolean[] { false, }, "START", null, null, false);
    new CRFTrainerByLabelLikelihood(crf1).trainIncremental(lists[0]);


    CRF crf2 = new CRF(p.getDataAlphabet(), p.getTargetAlphabet());
    crf2.addOrderNStates(lists[0], new int[] { 1, 2, }, new boolean[] {
        false, true }, "START", null, null, false);
    new CRFTrainerByLabelLikelihood(crf2).trainIncremental(lists[0]);


    CRF crf3 = new CRF(p.getDataAlphabet(), p.getTargetAlphabet());
    crf3.addOrderNStates(lists[0], new int[] { 1, 2, }, new boolean[] {
        false, false }, "START", null, null, false);
    new CRFTrainerByLabelLikelihood(crf3).trainIncremental(lists[0]);


    // Prevent cached values

View Full Code Here

    mcrf.setParameters(params);
    return mcrf.getValue();
  }


  public void testFrozenWeights() {
    Pipe p = makeSpacePredictionPipe();


    InstanceList instances = new InstanceList(p);
    instances.addThruPipe(new ArrayIterator(data));


    CRF crf1 = new CRF(p.getDataAlphabet(), p.getTargetAlphabet());
    crf1.addFullyConnectedStatesForLabels();
    CRFTrainerByLabelLikelihood crft1 = new CRFTrainerByLabelLikelihood(
        crf1);
    crft1.trainIncremental(instances);


    CRF crf2 = new CRF(p.getDataAlphabet(), p.getTargetAlphabet());
    crf2.addFullyConnectedStatesForLabels();
    // Freeze some weights, before training
    for (int i = 0; i < crf2.getWeights().length; i += 2)
      crf2.freezeWeights(i);
    CRFTrainerByLabelLikelihood crft2 = new CRFTrainerByLabelLikelihood(

View Full Code Here

  public void testDenseTrain() {
    doTestSpacePrediction(false, false, false);
  }


  public void testTrainStochasticGradient() {
    Pipe p = makeSpacePredictionPipe();
    Pipe p2 = new TestCRF2String();


    InstanceList instances = new InstanceList(p);
    instances.addThruPipe(new ArrayIterator(data));
    InstanceList[] lists = instances.split(new double[] { .5, .5 });
    CRF crf = new CRF(p, p2);

View Full Code Here

    System.out.println("Testing  Accuracy after training = "
        + crf.averageTokenAccuracy(lists[1]));
  }


  public void testSumLatticeImplementations() {
    Pipe p = makeSpacePredictionPipe();
    Pipe p2 = new TestCRF2String();


    // first do normal training for getting weights
    InstanceList instances = new InstanceList(p);
    instances.addThruPipe(new ArrayIterator(data));
    InstanceList[] lists = instances.split(new double[] { .5, .5 });

View Full Code Here

  public void testDenseSerialization() {
    doTestSpacePrediction(false, true, false);
  }


  public void testTokenAccuracy() {
    Pipe p = makeSpacePredictionPipe();


    InstanceList instances = new InstanceList(p);
    instances.addThruPipe(new ArrayIterator(data));
    InstanceList[] lists = instances.split(new Random(777), new double[] {
        .5, .5 });


    CRF crf = new CRF(p.getDataAlphabet(), p.getTargetAlphabet());
    crf.addFullyConnectedStatesForLabels();
    CRFTrainerByLabelLikelihood crft = new CRFTrainerByLabelLikelihood(crf);
    crft.setUseSparseWeights(true);


    crft.trainIncremental(lists[0]);

View Full Code Here

0 1 2 3 4 5

TOP

Related Classes of cc.mallet.pipe.Pipe

cc.mallet.classify.tui.SvmLight2Classify

cc.mallet.classify.WinnowTrainer

cc.mallet.cluster.examples.FirstOrderClusterExample

cc.mallet.cluster.tui.Clusterings2Clusterer

cc.mallet.cluster.tui.Clusterings2Clusterings

cc.mallet.extract.CRFExtractor

cc.mallet.extract.test.TestDocumentViewer

cc.mallet.extract.test.TestLatticeViewer

cc.mallet.fst.semi_supervised.tui.SimpleTaggerWithConstraints

cc.mallet.fst.SimpleTagger

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.