Generic Sequence Pattern for regular expressions.
Similar to Java's {@link java.util.regex.Pattern} except it is for sequences over arbitrary types T insteadof just characters.
A regular expression must first be compiled into an instance of this class. The resulting pattern can then be used to create a {@link SequenceMatcher} object that can match arbitrary sequences of type Tagainst the regular expression. All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern.
To support sequence matching on a new type T, the following is needed:
- Implement a {@link NodePattern for matching type T}
- Optionally define a language for node matches and implement {@link SequencePattern.Parser} to compile aregular expression into a SequencePattern.
- Optionally implement a {@link MultiPatternMatcher.NodePatternTrigger}for optimizing matches across multiple patterns
- Optionally implement a {@link NodesMatchChecker} to support backreferences
See {@link TokenSequencePattern} for an example of how this class can be extendedto support a specific type {@code T}.
To use
SequencePattern p = SequencePattern.compile("...."); SequenceMatcher m = p.getMatcher(tokens); while (m.find()) ....
To support a new type {@code T}:
- For a type {@code T} to be matchable, it has to have a corresponding
NodePattern
that indicateswhether a node is matched or not (see CoreMapNodePattern
for example) - To compile a string into corresponding pattern, will need to create a parser (see inner class
Parser
, TokenSequencePattern
and TokenSequenceParser.jj
)
SequencePattern supports the following standard regex features:
- Concatenation
- Or
- Groups (capturing / noncapturing )
- Quantifiers (greedy / nongreedy)
SequencePattern also supports the following less standard features:
- Environment (see {@link Env}) with respect to which the patterns are compiled
- Binding of variables
Use {@link Env} to bind variables for use when compiling patterns
Can also bind names to groups (see {@link SequenceMatchResult} for accessor methods to retrieve matched groups) - Backreference matches - need to specify how back references are to be matched using {@link NodesMatchChecker}
- Multinode matches - for matching of multiple nodes using non-regex (at least not regex over nodes) patterns (need to have corresponding {@link MultiNodePattern}, see {@link MultiCoreMapNodePattern} for example)
- Conjunctions - conjunctions of sequence patterns (works for some cases)
@author Angel Chang
@see SequenceMatcher