A regular expression is zero or more branches
, separated by "|". It matches anything that matches one of the branches.
A branch is zero or more pieces
, concatenated. It matches a match for the first piece, followed by a match for the second piece, etc.
A piece is an atom
, possibly followed by "*", "+", or "?".
An atom is
range
(see below) A range
is a sequence of characters enclosed in "[]". The range normally matches any single character from the sequence. If the sequence begins with "^", the range matches any single character not from the rest of the sequence. If two characters in the sequence are separated by "-", this is shorthand for the full list of characters between them (e.g. "[0-9]" matches any decimal digit). To include a literal "]" in the sequence, make it the first character (following a possible "^"). To include a literal "-", make it the first or last character.
In general there may be more than one way to match a regular expression to an input string. For example, consider the command
String[] match = new String[2]; Regexp.match("(a*)b*", "aabaaabb", match);Considering only the rules given so far,
match[0]
and match[1]
could end up with the values In the example from above, "(a*)b*" therefore matches exactly "aab"; the "(a*)" portion of the pattern is matched first and it consumes the leading "aa", then the "b*" portion of the pattern consumes the next "b". Or, consider the following example:
String match = new String[3]; Regexp.match("(ab|a)(b*)c", "abc", match);After this command,
match[0]
will be "abc", match[1]
will be "ab", and match[2]
will be an empty string. Rule 4 specifies that the "(ab|a)" component gets first shot at the input string and Rule 2 specifies that the "ab" sub-expression is checked before the "a" sub-expression. Thus the "b" has already been claimed before the "(b*)" component is checked and therefore "(b*)" must match an empty string. Regular expression substitution matches a string against a regular expression, transforming the string by replacing the matched region(s) with new substring(s).
What gets substituted into the result is controlled by a subspec
. The subspec is a formatting string that specifies what portions of the matched region should be substituted into the result.
n
", where n
is a digit from 1 to 9, is replaced with a copy of the n
th subexpression. backslash
and "2", not the Unicode character 0002. public static void main(String[] args) throws Exception { Regexp re; String[] matches; String s; / * A regular expression to match the first line of a HTTP request. * * 1. ^ - starting at the beginning of the line * 2. ([A-Z]+) - match and remember some upper case characters * 3. [ \t]+ - skip blank space * 4. ([^ \t]*) - match and remember up to the next blank space * 5. [ \t]+ - skip more blank space * 6. (HTTP/1\\.[01]) - match and remember HTTP/1.0 or HTTP/1.1 * 7. $ - end of string - no chars left. */ s = "GET http://a.b.com:1234/index.html HTTP/1.1"; re = new Regexp("^([A-Z]+)[ \t]+([^ \t]+)[ \t]+(HTTP/1\\.[01])$"); matches = new String[4]; if (re.match(s, matches)) { System.out.println("METHOD " + matches[1]); System.out.println("URL " + matches[2]); System.out.println("VERSION " + matches[3]); } / * A regular expression to extract some simple comma-separated data, * reorder some of the columns, and discard column 2. */ s = "abc,def,ghi,klm,nop,pqr"; re = new Regexp("^([^,]+),([^,]+),([^,]+),(.*)"); System.out.println(re.sub(s, "\\3,\\1,\\4")); }@author Colin Stevens (colin.stevens@sun.com) @version 2.3 @see Regsub
|
|
|
|
|
|
|
|
|
|