Examples of weka.classifiers.Evaluation

weka.classifiers.Evaluation
Class for evaluating machine learning models.
-------------------------------------------------------------------
General options when evaluating a learning scheme from the command-line:
-t filename
Name of the file with the training data. (required)
-T filename
Name of the file with the test data. If missing a cross-validation is performed.
-c index
Index of the class attribute (1, 2, ...; default: last).
-x number
The number of folds for the cross-validation (default: 10).
-no-cv
No cross validation. If no test file is provided, no evaluation is done.
-split-percentage percentage
Sets the percentage for the train/test set split, e.g., 66.
-preserve-order
Preserves the order in the percentage split instead of randomizing the data first with the seed value ('-s').
-s seed
Random number seed for the cross-validation and percentage split (default: 1).
-m filename
The name of a file containing a cost matrix.
-l filename
Loads classifier from the given file. In case the filename ends with ".xml", a PMML file is loaded or, if that fails, options are loaded from XML.
-d filename
Saves classifier built from the training data into the given file. In case the filename ends with ".xml" the options are saved XML, not the model.
-v
Outputs no statistics for the training data.
-o
Outputs statistics only, not the classifier.
-i
Outputs information-retrieval statistics per class.
-k
Outputs information-theoretic statistics.
-classifications "weka.classifiers.evaluation.output.prediction.AbstractOutput + options"
Uses the specified class for generating the classification output. E.g.: weka.classifiers.evaluation.output.prediction.PlainText or : weka.classifiers.evaluation.output.prediction.CSV -p range
Outputs predictions for test instances (or the train instances if no test instances provided and -no-cv is used), along with the attributes in the specified range (and nothing else). Use '-p 0' if no attributes are desired.
Deprecated: use "-classifications ..." instead.
-distribution
Outputs the distribution instead of only the prediction in conjunction with the '-p' option (only nominal classes).
Deprecated: use "-classifications ..." instead.
-r
Outputs cumulative margin distribution (and nothing else).
-g
Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).
-xml filename | xml-string
Retrieves the options from the XML-data instead of the command line.
-threshold-file file
The file to save the threshold data to. The format is determined by the extensions, e.g., '.arff' for ARFF format or '.csv' for CSV.
-threshold-label label
The class label to determine the threshold data for (default is the first label)
-------------------------------------------------------------------
Example usage as the main of a classifier (called FunkyClassifier):
```
 public static void main(String [] args) { runClassifier(new FunkyClassifier(), args); } 
```
------------------------------------------------------------------
Example usage from within an application:
```
 Instances trainInstances = ... instances got from somewhere Instances testInstances = ... instances got from somewhere Classifier scheme = ... scheme got from somewhere Evaluation evaluation = new Evaluation(trainInstances); evaluation.evaluateModel(scheme, testInstances); System.out.println(evaluation.toSummaryString()); 
```
@author Eibe Frank (eibe@cs.waikato.ac.nz) @author Len Trigg (trigg@cs.waikato.ac.nz) @version $Revision: 7228 $

    delTransform.setInputFormat(trainCopy);
    trainCopy = Filter.useFilter(trainCopy, delTransform);


    // max of 5 repititions ofcross validation
    for (i = 0; i < 5; i++) {
      m_Evaluation = new Evaluation(trainCopy);
      m_Evaluation.crossValidateModel(m_BaseClassifier, trainCopy, m_folds, Rnd);
      repError[i] = m_Evaluation.errorRate();


      // check on the standard deviation
      if (!repeat(repError, i + 1)) {

View Full Code Here

    }


    // build the classifier
    m_Classifier.buildClassifier(trainCopy);


    m_Evaluation = new Evaluation(trainCopy);
    if (!m_useTraining) {
      m_Evaluation.evaluateModel(m_Classifier, testCopy);
    } else {
      m_Evaluation.evaluateModel(m_Classifier, trainCopy);
    }

View Full Code Here

    testCopy = Filter.useFilter(testCopy, delTransform);


    // build the classifier
    m_Classifier.buildClassifier(trainCopy);


    m_Evaluation = new Evaluation(trainCopy);
    m_Evaluation.evaluateModel(m_Classifier, testCopy);


    if (m_trainingInstances.classAttribute().isNominal()) {
      errorRate = m_Evaluation.errorRate();
    } else {

View Full Code Here

   */
  public void acceptClassifier(final IncrementalClassifierEvent ce) {
    try {
      if (ce.getStatus() == IncrementalClassifierEvent.NEW_BATCH) {
  //  m_eval = new Evaluation(ce.getCurrentInstance().dataset());
  m_eval = new Evaluation(ce.getStructure());
  m_dataLegend = new Vector();
  m_reset = true;
  m_dataPoint = new double[0];
  Instances inst = ce.getStructure();
  System.err.println("NEW BATCH");

View Full Code Here

  protected void buildClassifierUsingResampling(Instances data) 
    throws Exception {


    Instances trainData, sample, training;
    double epsilon, reweight, sumProbs;
    Evaluation evaluation;
    int numInstances = data.numInstances();
    Random randomInstance = new Random(m_Seed);
    int resamplingIterations = 0;


    // Initialize data
    m_Betas = new double [m_Classifiers.length];
    m_NumIterationsPerformed = 0;
    // Create a copy of the data so that when the weights are diddled
    // with it doesn't mess up the weights for anyone else
    training = new Instances(data, 0, numInstances);
    sumProbs = training.sumOfWeights();
    for (int i = 0; i < training.numInstances(); i++) {
      training.instance(i).setWeight(training.instance(i).
              weight() / sumProbs);
    }
    
    // Do boostrap iterations
    for (m_NumIterationsPerformed = 0; m_NumIterationsPerformed < m_Classifiers.length; 
   m_NumIterationsPerformed++) {
      if (m_Debug) {
  System.err.println("Training classifier " + (m_NumIterationsPerformed + 1));
      }


      // Select instances to train the classifier on
      if (m_WeightThreshold < 100) {
  trainData = selectWeightQuantile(training, 
           (double)m_WeightThreshold / 100);
      } else {
  trainData = new Instances(training);
      }
      
      // Resample
      resamplingIterations = 0;
      double[] weights = new double[trainData.numInstances()];
      for (int i = 0; i < weights.length; i++) {
  weights[i] = trainData.instance(i).weight();
      }
      do {
  sample = trainData.resampleWithWeights(randomInstance, weights);


  // Build and evaluate classifier
  m_Classifiers[m_NumIterationsPerformed].buildClassifier(sample);
  evaluation = new Evaluation(data);
  evaluation.evaluateModel(m_Classifiers[m_NumIterationsPerformed], 
         training);
  epsilon = evaluation.errorRate();
  resamplingIterations++;
      } while (Utils.eq(epsilon, 0) && 
        (resamplingIterations < MAX_NUM_RESAMPLING_ITERATIONS));
        
      // Stop if error too big or 0

View Full Code Here

  protected void buildClassifierWithWeights(Instances data) 
    throws Exception {


    Instances trainData, training;
    double epsilon, reweight;
    Evaluation evaluation;
    int numInstances = data.numInstances();
    Random randomInstance = new Random(m_Seed);


    // Initialize data
    m_Betas = new double [m_Classifiers.length];
    m_NumIterationsPerformed = 0;


    // Create a copy of the data so that when the weights are diddled
    // with it doesn't mess up the weights for anyone else
    training = new Instances(data, 0, numInstances);
    
    // Do boostrap iterations
    for (m_NumIterationsPerformed = 0; m_NumIterationsPerformed < m_Classifiers.length; 
   m_NumIterationsPerformed++) {
      if (m_Debug) {
  System.err.println("Training classifier " + (m_NumIterationsPerformed + 1));
      }
      // Select instances to train the classifier on
      if (m_WeightThreshold < 100) {
  trainData = selectWeightQuantile(training, 
           (double)m_WeightThreshold / 100);
      } else {
  trainData = new Instances(training, 0, numInstances);
      }


      // Build the classifier
      if (m_Classifiers[m_NumIterationsPerformed] instanceof Randomizable)
  ((Randomizable) m_Classifiers[m_NumIterationsPerformed]).setSeed(randomInstance.nextInt());
      m_Classifiers[m_NumIterationsPerformed].buildClassifier(trainData);


      // Evaluate the classifier
      evaluation = new Evaluation(data);
      evaluation.evaluateModel(m_Classifiers[m_NumIterationsPerformed], training);
      epsilon = evaluation.errorRate();


      // Stop if error too small or error too big and ignore this model
      if (Utils.grOrEq(epsilon, 0.5) || Utils.eq(epsilon, 0)) {
  if (m_NumIterationsPerformed == 0) {
    m_NumIterationsPerformed = 1; // If we're the first we have to to use it

View Full Code Here

      public void run() {
        final String oldText = m_visual.getText();
        try {
    if (ce.getSetNumber() == 1 || 
        ce.getClassifier() != m_classifier) {
      m_eval = new Evaluation(ce.getTestSet().getDataSet());
      m_classifier = ce.getClassifier();
      m_predInstances = 
        weka.gui.explorer.ClassifierPanel.
        setUpVisualizableInstances(new Instances(ce.getTestSet().getDataSet()));
      m_plotShape = new FastVector();

View Full Code Here

    int addm = (m_AdditionalMeasures != null) ? m_AdditionalMeasures.length : 0;
    Object [] result = new Object[RESULT_SIZE+addm];
    long thID = Thread.currentThread().getId();
    long CPUStartTime=-1, trainCPUTimeElapsed=-1, testCPUTimeElapsed=-1,
         trainTimeStart, trainTimeElapsed, testTimeStart, testTimeElapsed;    
    Evaluation eval = new Evaluation(train);
    m_Classifier = Classifier.makeCopy(m_Template);


    trainTimeStart = System.currentTimeMillis();
    if(canMeasureCPUTime)
      CPUStartTime = thMonitor.getThreadUserTime(thID);
    m_Classifier.buildClassifier(train);
    if(canMeasureCPUTime)
      trainCPUTimeElapsed = thMonitor.getThreadUserTime(thID) - CPUStartTime;
    trainTimeElapsed = System.currentTimeMillis() - trainTimeStart;
    testTimeStart = System.currentTimeMillis();
    if(canMeasureCPUTime)
      CPUStartTime = thMonitor.getThreadUserTime(thID);
    eval.evaluateModel(m_Classifier, test);
    if(canMeasureCPUTime)
      testCPUTimeElapsed = thMonitor.getThreadUserTime(thID) - CPUStartTime;
    testTimeElapsed = System.currentTimeMillis() - testTimeStart;
    thMonitor = null;
    
    m_result = eval.toSummaryString();
    // The results stored are all per instance -- can be multiplied by the
    // number of instances to get absolute numbers
    int current = 0;
    result[current++] = new Double(train.numInstances());
    result[current++] = new Double(eval.numInstances());


    result[current++] = new Double(eval.meanAbsoluteError());
    result[current++] = new Double(eval.rootMeanSquaredError());
    result[current++] = new Double(eval.relativeAbsoluteError());
    result[current++] = new Double(eval.rootRelativeSquaredError());
    result[current++] = new Double(eval.correlationCoefficient());


    result[current++] = new Double(eval.SFPriorEntropy());
    result[current++] = new Double(eval.SFSchemeEntropy());
    result[current++] = new Double(eval.SFEntropyGain());
    result[current++] = new Double(eval.SFMeanPriorEntropy());
    result[current++] = new Double(eval.SFMeanSchemeEntropy());
    result[current++] = new Double(eval.SFMeanEntropyGain());
    
    // Timing stats
    result[current++] = new Double(trainTimeElapsed / 1000.0);
    result[current++] = new Double(testTimeElapsed / 1000.0);
    if(canMeasureCPUTime) {

View Full Code Here

   * @param test   The instances for which to cache predictions.
   * @throws Exception   if somethng goes wrong
   */
  private void cachePredictions(Instances test) throws Exception {
    m_cachedPredictions = new HashMap();
    Evaluation evalModel = null;
    Instances originalInstances = null;
    // If the verbose flag is set, we'll also print out the performances of
    // all the individual models w.r.t. this test set while we're at it.
    boolean printModelPerformances = getVerboseOutput();
    if (printModelPerformances) {
      // To get performances, we need to keep the class attribute.
      originalInstances = new Instances(test);
    }
    
    // For each model, we'll go through the dataset and get predictions.
    // The idea is we want to only have one model in memory at a time, so
    // we'll
    // load one model in to memory, get all its predictions, and add them to
    // the
    // hash map. Then we can release it from memory and move on to the next.
    for (int i = 0; i < m_chosen_models.length; ++i) {
      if (printModelPerformances) {
  // If we're going to print predictions, we need to make a new
  // Evaluation object.
  evalModel = new Evaluation(originalInstances);
      }
      
      Date startTime = new Date();
      
      // Load the model in to memory.
      m_chosen_models[i].rehydrateModel(m_workingDirectory.getAbsolutePath());
      // Now loop through all the instances and get the model's
      // predictions.
      for (int j = 0; j < test.numInstances(); ++j) {
  Instance currentInstance = test.instance(j);
  // When we're looking for a cached prediction later, we'll only
  // have the non-class attributes, so we set the class missing
  // here
  // in order to make the string match up properly.
  currentInstance.setClassMissing();
  String stringInstance = currentInstance.toString();
  
  // When we come in here with the first model, the instance will
  // not
  // yet be part of the map.
  if (!m_cachedPredictions.containsKey(stringInstance)) {
    // The instance isn't in the map yet, so add it.
    // For each instance, we store a two-dimensional array - the
    // first
    // index is over all the models in the ensemble, and the
    // second
    // index is over the (i.e., typical prediction array).
    int predSize = test.classAttribute().isNumeric() ? 1 : test
        .classAttribute().numValues();
    double predictionArray[][] = new double[m_chosen_models.length][predSize];
    m_cachedPredictions.put(stringInstance, predictionArray);
  }
  // Get the array from the map which is associated with this
  // instance
  double predictions[][] = (double[][]) m_cachedPredictions
  .get(stringInstance);
  // And add our model's prediction for it.
  predictions[i] = m_chosen_models[i].getAveragePrediction(test
      .instance(j));
  
  if (printModelPerformances) {
    evalModel.evaluateModelOnceAndRecordPrediction(
        predictions[i], originalInstances.instance(j));
  }
      }
      // Now we're done with model #i, so we can release it.
      m_chosen_models[i].releaseModel();
      
      Date endTime = new Date();
      long diff = endTime.getTime() - startTime.getTime();
      
      if (m_Debug)
  System.out.println("Test time for "
      + m_chosen_models[i].getStringRepresentation()
      + " was: " + diff);
      
      if (printModelPerformances) {
  String output = new String(m_chosen_models[i]
                                             .getStringRepresentation()
                                             + ": ");
  output += "\tRMSE:" + evalModel.rootMeanSquaredError();
  output += "\tACC:" + evalModel.pctCorrect();
  if (test.numClasses() == 2) {
    // For multiclass problems, we could print these too, but
    // it's
    // not clear which class we should use in that case... so
    // instead
    // we only print these metrics for binary classification
    // problems.
    output += "\tROC:" + evalModel.areaUnderROC(1);
    output += "\tPREC:" + evalModel.precision(1);
    output += "\tFSCR:" + evalModel.fMeasure(1);
  }
  System.out.println(output);
      }
    }
  }

View Full Code Here

    boolean canMeasureCPUTime = thMonitor.isThreadCpuTimeSupported();
    if(!thMonitor.isThreadCpuTimeEnabled())
      thMonitor.setThreadCpuTimeEnabled(true);
    
    Object [] result = new Object[overall_length];
    Evaluation eval = new Evaluation(train);
    m_Classifier = Classifier.makeCopy(m_Template);
    double [] predictions;
    long thID = Thread.currentThread().getId();
    long CPUStartTime=-1, trainCPUTimeElapsed=-1, testCPUTimeElapsed=-1,
         trainTimeStart, trainTimeElapsed, testTimeStart, testTimeElapsed;    


    //training classifier
    trainTimeStart = System.currentTimeMillis();
    if(canMeasureCPUTime)
      CPUStartTime = thMonitor.getThreadUserTime(thID);
    m_Classifier.buildClassifier(train);    
    if(canMeasureCPUTime)
      trainCPUTimeElapsed = thMonitor.getThreadUserTime(thID) - CPUStartTime;
    trainTimeElapsed = System.currentTimeMillis() - trainTimeStart;
    
    //testing classifier
    testTimeStart = System.currentTimeMillis();
    if(canMeasureCPUTime) 
      CPUStartTime = thMonitor.getThreadUserTime(thID);
    predictions = eval.evaluateModel(m_Classifier, test);
    if(canMeasureCPUTime)
      testCPUTimeElapsed = thMonitor.getThreadUserTime(thID) - CPUStartTime;
    testTimeElapsed = System.currentTimeMillis() - testTimeStart;
    thMonitor = null;
    
    m_result = eval.toSummaryString();
    // The results stored are all per instance -- can be multiplied by the
    // number of instances to get absolute numbers
    int current = 0;
    result[current++] = new Double(train.numInstances());
    result[current++] = new Double(eval.numInstances());
    result[current++] = new Double(eval.correct());
    result[current++] = new Double(eval.incorrect());
    result[current++] = new Double(eval.unclassified());
    result[current++] = new Double(eval.pctCorrect());
    result[current++] = new Double(eval.pctIncorrect());
    result[current++] = new Double(eval.pctUnclassified());
    result[current++] = new Double(eval.kappa());
    
    result[current++] = new Double(eval.meanAbsoluteError());
    result[current++] = new Double(eval.rootMeanSquaredError());
    result[current++] = new Double(eval.relativeAbsoluteError());
    result[current++] = new Double(eval.rootRelativeSquaredError());
    
    result[current++] = new Double(eval.SFPriorEntropy());
    result[current++] = new Double(eval.SFSchemeEntropy());
    result[current++] = new Double(eval.SFEntropyGain());
    result[current++] = new Double(eval.SFMeanPriorEntropy());
    result[current++] = new Double(eval.SFMeanSchemeEntropy());
    result[current++] = new Double(eval.SFMeanEntropyGain());
    
    // K&B stats
    result[current++] = new Double(eval.KBInformation());
    result[current++] = new Double(eval.KBMeanInformation());
    result[current++] = new Double(eval.KBRelativeInformation());
    
    // IR stats
    result[current++] = new Double(eval.truePositiveRate(m_IRclass));
    result[current++] = new Double(eval.numTruePositives(m_IRclass));
    result[current++] = new Double(eval.falsePositiveRate(m_IRclass));
    result[current++] = new Double(eval.numFalsePositives(m_IRclass));
    result[current++] = new Double(eval.trueNegativeRate(m_IRclass));
    result[current++] = new Double(eval.numTrueNegatives(m_IRclass));
    result[current++] = new Double(eval.falseNegativeRate(m_IRclass));
    result[current++] = new Double(eval.numFalseNegatives(m_IRclass));
    result[current++] = new Double(eval.precision(m_IRclass));
    result[current++] = new Double(eval.recall(m_IRclass));
    result[current++] = new Double(eval.fMeasure(m_IRclass));
    result[current++] = new Double(eval.areaUnderROC(m_IRclass));
    
    // Timing stats
    result[current++] = new Double(trainTimeElapsed / 1000.0);
    result[current++] = new Double(testTimeElapsed / 1000.0);
    if(canMeasureCPUTime) {

View Full Code Here

0 1 2 3 4 5 6 7 8 9

TOP

Related Classes of weka.classifiers.Evaluation

bnGUI.att_BayesAttSel

bnGUI.att_ev_SingleBayesROCAttributeEval

bnGUI.att_Ranker$EvalThread

bnGUI.gui_WekaExperiment

bnGUI.Thread_competitors

bnGUI.Thread_weka_run

com.deafgoat.ml.prognosticator.AppClassifier

CrossValWeka

de.tudarmstadt.ukp.similarity.experiments.coling2012.util.Evaluator

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.