Examples of edu.washington.cs.knowitall.sequence.LayeredTokenPattern

Package edu.washington.cs.knowitall.sequence

Examples of edu.washington.cs.knowitall.sequence.LayeredTokenPattern

edu.washington.cs.knowitall.sequence.LayeredTokenPattern
A class that defines a regular expression over the tokens appearing in a {@link LayeredSequence} object.

For example, suppose we want to find parts of sentences that match the pattern "DT cow", where "DT" is the part-of-speech tag representing a determiner. Assume that sentences are represented as {@link LayeredSequence}objects, where the words layer has the name "word" and the part-of-speech layer has the name "pos". Then the above pattern can be constructed by calling {@code new LayeredTokenPattern("DT_pos cow_word")}. Given a test sentence {@code sent}, the {@link #matcher(LayeredSequence)} method will return a{@link LayeredTokenMatcher} object that will allow you to access the rangesand groups.

The patterns are expressed using the standard {@link java.util.regex.Pattern}language, but with the following changes.

The basic unit of match is not a character, but instead a token. A token consists of two parts: a value and a layer name. A token is expressed using an underscore to separate the two. For example {@code Foo_bar} will matchwhen the token @{code Foo} appears on the layer with the name {@code bar}. In the example above, the token {@code DT_pos} will match the word- POS pair{@code (w, p)} pair when {@code p = DT}. The value of {@code w} is allowed tobe anything. Currently there is no way to match the value of multiple layers at once (e.g. match all occurrences of "bank" that are nouns).

The value of a token can only have characters from this set: {@code [a-zA-Z0-9\\-.,:;?!"'`]}. The layer name can only have characters from this set: {@code [a-zA-Z0-9\\-]}.

When expressing a pattern, tokens must be space separated.

In the following examples {@code pos} refers to a part-of-speech layer, and{@code word} refers to a word layer.
- {@code ^John_word lives_word in_word NNP_pos+} - matches sentences that startwith "John lives in" and then is followed by at least one proper noun.
- {@code ^(NNP_pos+) lives_word in_word (NNP_pos+) ._pos$} - matches sentencesthat start with at least one proper noun, followed by "lives in", followed by at least one proper noun, and then ending with a period. Captures the two proper nouns as groups (see {@link LayeredTokenMatcher}).
@author afader

  }


  @Test
  public void testMatcher1() throws SequenceException {
    String patternStr = "There_w are_w CD_p [B-NP_n I-NP_n]+ (IN_p [B-NP_n I-NP_n]+)*";
    LayeredTokenPattern pat = new LayeredTokenPattern(patternStr);
    LayeredTokenMatcher m = pat.matcher(seq);
    assertTrue(m.find());
    assertEquals(0, m.start());
    assertEquals(6, m.end());
  }

View Full Code Here

  }
  
  @Test
  public void testMatcher2() throws SequenceException {
    String patternStr = "B-NP_n I-NP_n*";
    LayeredTokenPattern pat = new LayeredTokenPattern(patternStr);
    LayeredTokenMatcher m = pat.matcher(seq);
    assertTrue(m.find());
    assertEquals(2, m.start());
    assertEquals(4, m.end());
    assertTrue(m.find());
    assertEquals(5, m.start());

View Full Code Here

  }
  
  @Test
  public void testMatcher3() throws SequenceException {
    String patternStr = "B-NP_n I-NP_n* ._p?$";
    LayeredTokenPattern pat = new LayeredTokenPattern(patternStr);
    LayeredTokenMatcher m = pat.matcher(seq);
    assertTrue(m.find());
    assertEquals(5, m.start());
    assertEquals(7, m.end());
    assertFalse(m.find());
  }

View Full Code Here

  }
  
  @Test
  public void testMatcher4() throws SequenceException {
    String patternStr = "...";
    LayeredTokenPattern pat = new LayeredTokenPattern(patternStr);
    LayeredTokenMatcher m = pat.matcher(seq);
    assertTrue(m.find());
    assertEquals(0, m.start());
    assertEquals(3, m.end());
    assertTrue(m.find());
    assertEquals(3, m.start());

View Full Code Here

  }
  
  @Test(expected=SequenceException.class)
  public void testMatcher5() throws SequenceException {
    String patternStr = "^ [^A_x B_x] C_x $";
    @SuppressWarnings("unused")
    LayeredTokenPattern pat = new LayeredTokenPattern(patternStr);
  }

View Full Code Here

  
  @Test(expected=SequenceException.class)
  public void testMatcher6() throws Exception {


    String patternStr = "B-NP_np I-NP_np* from_word the_word B-NP_np I-NP_np*";
    LayeredTokenPattern pattern = new LayeredTokenPattern(patternStr);
    OpenNlpSentenceChunker chunker = new OpenNlpSentenceChunker();    
    pattern.matcher(chunker.chunkSentence("Hello, world."));
    
  }

View Full Code Here

    for (String str: split) results.add(str);
    return results;
  }
  
  public List<String> extract(String patternStr, String test) throws SequenceException {
    LayeredTokenPattern pattern = new LayeredTokenPattern(patternStr);
    RegexTagger tagger = new RegexTagger(pattern, "R");
    List<String> testList = listize(test);
    SimpleLayeredSequence seq = new SimpleLayeredSequence(testList.size());
    seq.addLayer("w", testList);
    return tagger.tag(seq);

View Full Code Here

    public RegexGroupExtractor(LayeredTokenPattern pattern) {
        this.pattern = pattern;
    }


    public RegexGroupExtractor(String patternStr) {
        this(new LayeredTokenPattern(patternStr));
    }

View Full Code Here

     * @throws SequenceException
     *             if unable to compile pattern
     */
    public RegexExtractor(String patternString) throws SequenceException {
        this.patternString = patternString;
        this.pattern = new LayeredTokenPattern(patternString);
    }

View Full Code Here

     */
    private Predicate<ChunkedBinaryExtraction> relIsVWP()
            throws SequenceException {
        final String patternStr = String.format("(%s (%s+ (%s)+))+", VERB,
                WORD, PREP);
        final LayeredTokenPattern pattern = new LayeredTokenPattern(patternStr);
        return new Predicate<ChunkedBinaryExtraction>() {
            public boolean apply(ChunkedBinaryExtraction e) {
                try {
                    LayeredTokenMatcher m = pattern.matcher(e.getRelation());
                    int n = 0;
                    while (m.find())
                        n++;
                    return n == 1;
                } catch (SequenceException ex) {

View Full Code Here

TOP

Related Classes of edu.washington.cs.knowitall.sequence.LayeredTokenPattern

edu.washington.cs.knowitall.extractor.conf.ReVerbFeatures

edu.washington.cs.knowitall.extractor.RegexExtractor

edu.washington.cs.knowitall.extractor.RegexGroupExtractor

edu.washington.cs.knowitall.sequence.LayeredTokenPatternTest

edu.washington.cs.knowitall.sequence.RegexTaggerTest

java.util.regex.Matcher

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.