Examples of edu.stanford.nlp.semgraph.semgrex.SemgrexPattern

Package edu.stanford.nlp.semgraph.semgrex

Examples of edu.stanford.nlp.semgraph.semgrex.SemgrexPattern

edu.stanford.nlp.semgraph.semgrex.SemgrexPattern

A SemgrexPattern is a tgrep-type pattern for matching node configurations in one of the SemanticGraph structures. Unlike tgrep but like Unix grep, there is no pre-indexing of the data to be searched. Rather there is a linear scan through the graph where matches are sought.

SemgrexPattern instances can be matched against instances of the {@link IndexedWord} class.

A node is represented by a set of attributes and their values contained by curly braces: {attr1:value1;attr2:value2;...}. Therefore, {} represents any node in the graph. Attributes must be plain strings; values can be strings or regular expressions blocked off by "/". (I think regular expressions must match the whole attribute value; so that /NN/ matches "NN" only, while /NN.* / matches "NN", "NNS", "NNP", etc. --wcmac)

For example, {lemma:slice;tag:/VB.* /} represents any verb nodes with "slice" as their lemma. Attributes are extracted using edu.stanford.nlp.ling.AnnotationLookup.

The root of the graph can be marked by the $ sign, that is {$} represents the root node.

Relations are defined by a symbol representing the type of relationship and a string or regular expression representing the value of the relationship. A relationship string of % means any relationship. It is also OK simply to omit the relationship symbol altogether.

Currently supported node relations and their symbols:

Symbol	Meaning
A <reln B	A is the dependent of a relation reln with B
A >reln B	A is the governer of a relation reln with B
A <<reln B	A is the dependent of a relation reln in a chain to B following dep->gov paths
A >>reln B	A is the governer of a relation reln in a chain to B following gov->dep paths
A x,y<<reln B	A is the dependent of a relation reln in a chain to B following dep->gov paths between distances of x and y
A x,y>>reln B	A is the governer of a relation reln in a chain to B following gov->dep paths between distances of x and y
A == B	A and B are the same nodes in the same graph
A @ B	A is aligned to B

In a chain of relations, all relations are relative to the first node in the chain. For example, "{} >nsubj {} >dobj {}" means "any node that is the governor of both a nsubj and a dobj relation". If instead what you want is a node that is the governer of a nsubj relation with a node that is itself the governer of dobj relation, you should write: "{} >nsubj ({} >dobj {})".

If a relation type is specified for the << relation, the relation type is only used for the first relation in the sequence. Therefore, if B depends on A with the relation type foo, the pattern {} <<foo {} will then match B and everything that depends on B.

Similarly, if a relation type is specified for the >> relation, the relation type is only used for the last relation in the sequence. Therefore, if A governs B with the relation type foo, the pattern {} >>foo {} will then match A and all of the nodes which have a sequence leading to A.

Boolean relational operators

Relations can be combined using the '&' and '|' operators, negated with the '!' operator, and made optional with the '?' operator.

Relations can be grouped using brackets '[' and ']'. So the expression

{} [<subj {} | <agent {}] & @ {}

matches a node that is either the dep of a subj or agent relationship and has an alignment to some other node.

Relations can be negated with the '!' operator, in which case the expression will match only if there is no node satisfying the relation.

Relations can be made optional with the '?' operator. This way the expression will match even if the optional relation is not satisfied.

The operator ":" partitions a pattern into separate patterns, each of which must be matched. For example, the following is a pattern where the matched node must have both "foo" and "bar" as descendants:

{}=a >> {word:foo} : {}=a >> {word:bar}

This pattern could have been written

{}=a >> {word:foo} >> {word:bar}

However, for more complex examples, partitioning a pattern may make it more readable.

Naming nodes

Nodes can be given names (a.k.a. handles) using '='. A named node will be stored in a map that maps names to nodes so that if a match is found, the node corresponding to the named node can be extracted from the map. For example ({tag:NN}=noun) will match a singular noun node and after a match is found, the map can be queried with the name to retrieved the matched node using {@link SemgrexMatcher#getNode(String o)} with (String)argument "noun" (not "=noun"). Note that you are not allowed to name a node that is under the scope of a negation operator (the semantics would be unclear, since you can't store a node that never gets matched to). Trying to do so will cause a {@link ParseException} to be thrown. Named nodescan be put within the scope of an optionality operator.

Named nodes that refer back to previously named nodes need not have a node description -- this is known as "backreferencing". In this case, the expression will match only when all instances of the same name get matched to the same node. For example: the pattern {} >dobj ({} > {}=foo) >mod ({} > {}=foo) will match a graph in which there are two nodes, X and Y, for which X is the grandparent of Y and there are two paths to Y, one of which goes through a dobj and one of which goes through a mod.

Naming relations

It is also possible to name relations. For example, you can write the pattern {idx:1} >=reln {idx:2} The name of the relation will then be stored in the matcher and can be extracted with getRelnName("reln") At present, though, there is no backreferencing capability such as with the named nodes; this is only useful when using the API to extract the name of the relation used when making the match.

In the case of ancestor and descendant relations, the last relation in the sequence of relations is the name used.

@author Chloe Kiddon

  @SuppressWarnings("unchecked")
  public static SsurgeonPattern ssurgeonPatternFromXML(Element elt) throws Exception {
    String uid = getTagText(elt, SsurgeonPattern.UID_ELEM_TAG);
    String notes = getTagText(elt, SsurgeonPattern.NOTES_ELEM_TAG);
    String semgrexString = getTagText(elt, SsurgeonPattern.SEMGREX_ELEM_TAG);
    SemgrexPattern semgrexPattern = SemgrexPattern.compile(semgrexString);
    SsurgeonPattern retPattern = new SsurgeonPattern(uid, semgrexPattern);
    retPattern.setNotes(notes);
    NodeList editNodes = elt.getElementsByTagName(SsurgeonPattern.EDIT_LIST_ELEM_TAG);
    for (int i=0; i<editNodes.getLength(); i++) {
      Node node = editNodes.item(i);

View Full Code Here


    GrammaticalStructure gs = gsf.newGrammaticalStructure(tree);


    System.err.println(graph);


    SemgrexPattern semgrex = SemgrexPattern.compile("{}=A <<nsubj {}=B");
    SemgrexMatcher matcher = semgrex.matcher(graph);
    // This will produce two results on the given tree: "likes" is an
    // ancestor of both "dog" and "my" via the nsubj relation
    while (matcher.find()) {
      System.err.println(matcher.getNode("A") + " <<nsubj " + matcher.getNode("B"));
    }

View Full Code Here

    if (m.dependency.getRoots().size() == 0) {
      return new Pair<IndexedWord, String>();
    }
    // would be nice to condense this pattern, but sadly =reln
    // always uses the last relation in the sequence, not the first
    SemgrexPattern pattern = SemgrexPattern.compile("{idx:" + (m.headIndex+1) + "} [ <=reln {tag:/^V.*/}=verb | <=reln ({} << {tag:/^V.*/}=verb) ]");
    SemgrexMatcher matcher = pattern.matcher(m.dependency);
    while (matcher.find()) {
      return Pair.makePair(matcher.getNode("verb"), matcher.getRelnString("reln"));
    }
    return new Pair<IndexedWord, String>();
  }

View Full Code Here

   * semgrex match.
   */
  @Test
  public void simpleTest() throws Exception {
    SemanticGraph sg = SemanticGraph.valueOf("[mixed/VBN nsubj:[Joe/NNP appos:[bartender/NN det:the/DT]]  dobj:[drink/NN det:a/DT]]");
    SemgrexPattern semgrexPattern = SemgrexPattern.compile("{}=a1 >appos=e1 {}=a2 <nsubj=e2 {}=a3");
    SsurgeonPattern pattern = new SsurgeonPattern(semgrexPattern);


    System.out.println("Start = "+sg.toCompactString());


    // Find and snip the appos and root to nsubj links

View Full Code Here

TOP

Related Classes of edu.stanford.nlp.semgraph.semgrex.SemgrexPattern

edu.stanford.nlp.dcoref.Mention

edu.stanford.nlp.semgraph.SemanticGraph

edu.stanford.nlp.semgraph.semgrex.demo.SemgrexDemo

edu.stanford.nlp.semgraph.semgrex.ssurgeon.Ssurgeon

edu.stanford.nlp.semgraph.semgrex.ssurgeon.SsurgeonTest

edu.stanford.nlp.trees.MemoryTreebank

edu.stanford.nlp.trees.TreeNormalizer

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.