Examples of uk.ac.cam.ch.wwmm.ptclib.misc.Stemmer.wordsToStems()

Class uk.ac.cam.ch.wwmm.ptclib.misc.Stemmer

Examples of uk.ac.cam.ch.wwmm.ptclib.misc.Stemmer.wordsToStems()

uk.ac.cam.ch.wwmm.ptclib.misc.Stemmer.wordsToStems()
Takes a list of words, finds the stems, and groups the words together by stem. This also works for multi-word terms; the last word will be stemmed, as whitespace is treated like any other character. @param words The words to stem. @return A mapping from the stems to the lists of words that correspondto the stems.

      double excess = df.getCount(s) - expected;
      score = excess / clusterSize;        
      if(score > threshold) scores.put(s, score);
    }
    Stemmer st = new Stemmer(new EnglishStemmer());
    Map<String,List<String>> stems = st.wordsToStems(df.getSet());
    for(String stem : stems.keySet()) {
      List<String> words = stems.get(stem);
      if(words.size() > 1) {
        BooleanQuery bq = new BooleanQuery(true);
        for(String word : words) {

View Full Code Here

      


      clusterFiles.add(new File(ir.document(i).getField("filename").stringValue().replaceAll("markedup", "source")));
    }
    Stemmer st = new Stemmer(new EnglishStemmer());
    Map<String,List<String>> stems = st.wordsToStems(dfs.getSet());


    dfs.discardInfrequent(2);
    NGramTfDf ngtd = NGramTfDf.analyseFiles(clusterFiles);
    ngtd.calculateNGrams();
    Bag<String> bs = ngtd.getDfBag(2);

View Full Code Here

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.