Package cc.mallet.pipe

Examples of cc.mallet.pipe.CharSequenceLowercase


       
    // Begin by importing documents from text to feature sequences
    ArrayList<Pipe> pipeList = new ArrayList<Pipe>();

    // Pipes: lowercase, tokenize, remove stopwords, map to features
    pipeList.add( new CharSequenceLowercase() );
    //word format by Regular expression
    pipeList.add( new CharSequence2TokenSequence(Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")) );
    pipeList.add( new TokenSequenceRemoveStopwords(new File("stoplist/en.txt"), "UTF-8", false, false, false) );
    //add bigram words
    //pipeList.add(new TokenSequenceNGrams(new int[] {2} ));
View Full Code Here

TOP

Related Classes of cc.mallet.pipe.CharSequenceLowercase

Copyright © 2018 www.massapicom. All rights reserved.
All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.