Examples of org.apache.regexp.RE

org.apache.regexp.RE

ile expression boolean matched = r.match("xaaaab"); // Match against "xaaaab"
String wholeExpr = r.getParen(0); // wholeExpr will be 'aaaab' String insideParens = r.getParen(1); // insideParens will be 'aaaa'
int startWholeExpr = getParenStart(0); // startWholeExpr will be index 1 int endWholeExpr = getParenEnd(0); // endWholeExpr will be index 6 int lenWholeExpr = getParenLength(0); // lenWholeExpr will be 5
int startInside = getParenStart(1); // startInside will be index 1 int endInside = getParenEnd(1); // endInside will be index 5 int lenInside = getParenLength(1); // lenInside will be 4 You can also refer to the contents of a parenthesized expression within a regular expression itself. This is called a 'backreference'. The first backreference in a regular expression is denoted by \1, the second by \2 and so on. So the expression:

 ([0-9]+)=\1

will match any string of the form n=n (like 0=0 or 2=2).

The full regular expression syntax accepted by RE is described here:

 
 Characters 
 unicodeChar          Matches any identical unicode character \                    Used to quote a meta-character (like '*') \\                   Matches a single '\' character \0nnn                Matches a given octal character \xhh                 Matches a given 8-bit hexadecimal character \\uhhhh               Matches a given 16-bit hexadecimal character \t                   Matches an ASCII tab character \n                   Matches an ASCII newline character \r                   Matches an ASCII return character \f                   Matches an ASCII form feed character 
 Character Classes 
 [abc]                Simple character class [a-zA-Z]             Character class with ranges [^abc]               Negated character class 
 Standard POSIX Character Classes 
 [:alnum:]            Alphanumeric characters.  [:alpha:]            Alphabetic characters.  [:blank:]            Space and tab characters.  [:cntrl:]            Control characters.  [:digit:]            Numeric characters.  [:graph:]            Characters that are printable and are also visible. (A space is printable, but not visible, while an `a' is both.)  [:lower:]            Lower-case alphabetic characters.  [:print:]            Printable characters (characters that are not control characters.)  [:punct:]            Punctuation characters (characters that are not letter, digits, control characters, or space characters).  [:space:]            Space characters (such as space, tab, and formfeed, to name a few).  [:upper:]            Upper-case alphabetic characters.  [:xdigit:]           Characters that are hexadecimal digits. 
 Non-standard POSIX-style Character Classes 
 [:javastart:]        Start of a Java identifier [:javapart:]         Part of a Java identifier 
 Predefined Classes 
 .                    Matches any character other than newline \w                   Matches a "word" character (alphanumeric plus "_") \W                   Matches a non-word character \s                   Matches a whitespace character \S                   Matches a non-whitespace character \d                   Matches a digit character \D                   Matches a non-digit character 
 Boundary Matchers 
 ^                    Matches only at the beginning of a line $                    Matches only at the end of a line \b                   Matches only at a word boundary \B                   Matches only at a non-word boundary 
 Greedy Closures 
 A*                   Matches A 0 or more times (greedy) A+                   Matches A 1 or more times (greedy) A?                   Matches A 1 or 0 times (greedy) A{n}                 Matches A exactly n times (greedy) A{n,}                Matches A at least n times (greedy) A{n,m}               Matches A at least n but not more than m times (greedy) 
 Reluctant Closures 
 A*?                  Matches A 0 or more times (reluctant) A+?                  Matches A 1 or more times (reluctant) A??                  Matches A 0 or 1 times (reluctant) 
 Logical Operators 
 AB                   Matches A followed by B A|B                  Matches either A or B (A)                  Used for subexpression grouping 
 Backreferences 
 \1                   Backreference to 1st parenthesized subexpression \2                   Backreference to 2nd parenthesized subexpression \3                   Backreference to 3rd parenthesized subexpression \4                   Backreference to 4th parenthesized subexpression \5                   Backreference to 5th parenthesized subexpression \6                   Backreference to 6th parenthesized subexpression \7                   Backreference to 7th parenthesized subexpression \8                   Backreference to 8th parenthesized subexpression \9                   Backreference to 9th parenthesized subexpression

All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a '?'. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don't currently support reluctancy.

RE runs programs compiled by the RECompiler class. But the RE matcher class does not include the actual regular expression compiler for reasons of efficiency. In fact, if you want to pre-compile one or more regular expressions, the 'recompile' class can be invoked from the command line to produce compiled output like this:

 // Pre-compiled regular expression "a*b" char[] re1Instructions = { 0x007c, 0x0000, 0x001a, 0x007c, 0x0000, 0x000d, 0x0041, 0x0001, 0x0004, 0x0061, 0x007c, 0x0000, 0x0003, 0x0047, 0x0000, 0xfff6, 0x007c, 0x0000, 0x0003, 0x004e, 0x0000, 0x0003, 0x0041, 0x0001, 0x0004, 0x0062, 0x0045, 0x0000, 0x0000, }; 
 REProgram re1 = new REProgram(re1Instructions);

You can then construct a regular expression matcher (RE) object from the pre-compiled expression re1 and thus avoid the overhead of compiling the expression at runtime. If you require more dynamic regular expressions, you can construct a single RECompiler object and re-use it to compile each expression. Similarly, you can change the program run by a given matcher object at any time. However, RE and RECompiler are not threadsafe (for efficiency reasons, and because requiring thread safety in this class is deemed to be a rare requirement), so you will need to construct a separate compiler or matcher object for each thread (unless you do thread synchronization yourself).

ISSUES:

com.weusours.util.re is not currently compatible with all standard POSIX regcomp flags

com.weusours.util.re does not support POSIX equivalence classes ([=foo=] syntax) (I18N/locale issue)

com.weusours.util.re does not support nested POSIX character classes (definitely should, but not completely trivial)

com.weusours.util.re Does not support POSIX character collation concepts ([.foo.] syntax) (I18N/locale issue)

Should there be different matching styles (simple, POSIX, Perl etc?)

Should RE support character iterators (for backwards RE matching!)?

Should RE support reluctant {m,n} closures (does anyone care)?

Not *all* possibilities are considered for greediness when backreferences are involved (as POSIX suggests should be the case). The POSIX RE "(ac*)c*d[ac]*\1", when matched against "acdacaa" should yield a match of acdacaa where \1 is "a". This is not the case in this RE package, and actually Perl doesn't go to this extent either! Until someone actually complains about this, I'm not sure it's worth "fixing". If it ever is fixed, test #137 in RETest.txt should be updated.

@see recompile @see RECompiler @author Jonathan Locke @version $Id: RE.java,v 1.1 2000/04/27 01:22:33 jon Exp $

            throw new PatternException(rse.getMessage(), rse);
        }
    }


    protected boolean preparedMatch(REProgram preparedPattern, String match) {
        RE re = new RE(preparedPattern);


        if (match == null) {
            return false;
        }


        return re.match(match);
    }

View Full Code Here

     */
    protected boolean preparedMatch(REProgram preparedPattern, String match) {
        boolean result = false;
        
        if (match != null) {
            RE re = new RE(preparedPattern);
            result = re.match(match);
        }
        return result;
    }

View Full Code Here

            int comma = list.indexOf(',');
            if (comma < 0)
                break;
            String pattern = list.substring(0, comma).trim();
            try {
                reProgramList.add(new RE(pattern).getProgram());
            } catch (RESyntaxException e) {
                throw new IllegalArgumentException
                    (sm.getString("requestFilterValve.syntax", pattern));
            }
            list = list.substring(comma + 1);

View Full Code Here

            return;
        }


        
        // Create local RE since RE is not thread safe
        RE re = new RE();
        
        // Check the deny patterns, if any
        for (int i = 0; i < denies.length; i++) {
            re.setProgram(denies[i]);
            if (re.match(property)) {
                ServletResponse sres = response.getResponse();
                if (sres instanceof HttpServletResponse) {
                    HttpServletResponse hres = (HttpServletResponse) sres;
                    hres.sendError(HttpServletResponse.SC_FORBIDDEN);
                    return;
                }
            }
        }


        // Check the allow patterns, if any
        for (int i = 0; i < allows.length; i++) {
            re.setProgram(allows[i]);
            if (re.match(property)) {
                context.invokeNext(request, response);
                return;
            }
        }

View Full Code Here

            int comma = list.indexOf(',');
            if (comma < 0)
                break;
            String pattern = list.substring(0, comma).trim();
            try {
                reList.add(new RE(pattern));
            } catch (RESyntaxException e) {
                throw new IllegalArgumentException
                    ("Syntax error in request filter pattern");
            }
            list = list.substring(comma + 1);
        }


        RE reArray[] = new RE[reList.size()];
        return ((RE[]) reList.toArray(reArray));


    }

View Full Code Here

     *
     * @param userAgent user-agent string
     */
    public void addNoCompressionUserAgent(String userAgent) {
        try {
            RE nRule = new RE(userAgent);
            noCompressionUserAgents =
                addREArray(noCompressionUserAgents, nRule);
        } catch (RESyntaxException pse) {
            log.error(sm.getString("http11processor.regexp.error", userAgent), pse);
        }

View Full Code Here

     *
     * @param userAgent user-agent string
     */
    public void addRestrictedUserAgent(String userAgent) {
        try {
            RE nRule = new RE(userAgent);
            restrictedUserAgents = addREArray(restrictedUserAgents, nRule);
        } catch (RESyntaxException pse) {
            log.error(sm.getString("http11processor.regexp.error", userAgent), pse);
        }
    }

View Full Code Here

    /**
     * Match the prepared pattern against the value returned by {@link #getMatchString(Map, Parameters)}.
     */
    public Map preparedMatch(Object preparedPattern, Map objectModel, Parameters parameters) {
        
        RE re = new RE((REProgram)preparedPattern);
        String match = getMatchString(objectModel, parameters);
        
        if (match == null)
            return null;
        
        if(re.match(match)) {
            /* Handle parenthesised subexpressions. XXX: could be faster if we count
             * parens *outside* the generated code.
             * Note: *ONE* based, not zero.
             */
            int parenCount = re.getParenCount();
            Map map = new HashMap();
            for (int paren = 1; paren <= parenCount; paren++) {
                map.put(Integer.toString(paren), re.getParen(paren));
            }


            return map;
        }

View Full Code Here


            // configure the factory
            _setup( resolver );


            // setup everything for the current locale
            String[] matches = new RE( "_" ).split( lc );


            String l = matches.length > 0
                    ? matches[0] : Locale.getDefault().getLanguage();
            String c = matches.length > 1 ? matches[1] : "";
            String v = matches.length > 2 ? matches[2] : "";

View Full Code Here

        // the specific locale value
        String lc = (String) params.get( attribute );
        if ( lc != null )
            try {


                String[] matches = new RE( "_" ).split( lc );
                String l = matches.length > 0
                        ? matches[0] : Locale.getDefault().getLanguage();
                String c = matches.length > 1 ? matches[1] : "";
                String v = matches.length > 2 ? matches[2] : "";
                locale = new Locale( l, c, v );

View Full Code Here

0 1 2 3 4 5 6 7 8 9

TOP

Related Classes of org.apache.regexp.RE

ch.ethz.prose.filter.NameExpression

com.gftech.util.GFString

com.volantis.mcs.eclipse.ab.core.DeviceRepositoryAccessorManagerTestCase

com.volantis.mcs.servlet.CachingXDIMERequestProcessor

net.sf.regain.crawler.config.XmlCrawlerConfig

net.sf.regain.crawler.Crawler

net.sf.regain.crawler.document.AbstractPreparator

net.sf.regain.crawler.document.DocumentFactory

org.apache.catalina.valves.RequestFilterValve

org.apache.cocoon.acting.AbstractValidatorAction

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.