Examples of RE

  • juzu.impl.router.regex.RE
    @author Julien Viet
  • org.apache.regexp.RE
    ile expression boolean matched = r.match("xaaaab"); // Match against "xaaaab"
    String wholeExpr = r.getParen(0); // wholeExpr will be 'aaaab' String insideParens = r.getParen(1); // insideParens will be 'aaaa'
    int startWholeExpr = getParenStart(0); // startWholeExpr will be index 1 int endWholeExpr = getParenEnd(0); // endWholeExpr will be index 6 int lenWholeExpr = getParenLength(0); // lenWholeExpr will be 5
    int startInside = getParenStart(1); // startInside will be index 1 int endInside = getParenEnd(1); // endInside will be index 5 int lenInside = getParenLength(1); // lenInside will be 4 You can also refer to the contents of a parenthesized expression within a regular expression itself. This is called a 'backreference'. The first backreference in a regular expression is denoted by \1, the second by \2 and so on. So the expression:
     ([0-9]+)=\1 
    will match any string of the form n=n (like 0=0 or 2=2).

    The full regular expression syntax accepted by RE is described here:

     
    Characters
    unicodeChar Matches any identical unicode character \ Used to quote a meta-character (like '*') \\ Matches a single '\' character \0nnn Matches a given octal character \xhh Matches a given 8-bit hexadecimal character \\uhhhh Matches a given 16-bit hexadecimal character \t Matches an ASCII tab character \n Matches an ASCII newline character \r Matches an ASCII return character \f Matches an ASCII form feed character
    Character Classes
    [abc] Simple character class [a-zA-Z] Character class with ranges [^abc] Negated character class
    Standard POSIX Character Classes
    [:alnum:] Alphanumeric characters. [:alpha:] Alphabetic characters. [:blank:] Space and tab characters. [:cntrl:] Control characters. [:digit:] Numeric characters. [:graph:] Characters that are printable and are also visible. (A space is printable, but not visible, while an `a' is both.) [:lower:] Lower-case alphabetic characters. [:print:] Printable characters (characters that are not control characters.) [:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters). [:space:] Space characters (such as space, tab, and formfeed, to name a few). [:upper:] Upper-case alphabetic characters. [:xdigit:] Characters that are hexadecimal digits.
    Non-standard POSIX-style Character Classes
    [:javastart:] Start of a Java identifier [:javapart:] Part of a Java identifier
    Predefined Classes
    . Matches any character other than newline \w Matches a "word" character (alphanumeric plus "_") \W Matches a non-word character \s Matches a whitespace character \S Matches a non-whitespace character \d Matches a digit character \D Matches a non-digit character
    Boundary Matchers
    ^ Matches only at the beginning of a line $ Matches only at the end of a line \b Matches only at a word boundary \B Matches only at a non-word boundary
    Greedy Closures
    A* Matches A 0 or more times (greedy) A+ Matches A 1 or more times (greedy) A? Matches A 1 or 0 times (greedy) A{n} Matches A exactly n times (greedy) A{n,} Matches A at least n times (greedy) A{n,m} Matches A at least n but not more than m times (greedy)
    Reluctant Closures
    A*? Matches A 0 or more times (reluctant) A+? Matches A 1 or more times (reluctant) A?? Matches A 0 or 1 times (reluctant)
    Logical Operators
    AB Matches A followed by B A|B Matches either A or B (A) Used for subexpression grouping
    Backreferences
    \1 Backreference to 1st parenthesized subexpression \2 Backreference to 2nd parenthesized subexpression \3 Backreference to 3rd parenthesized subexpression \4 Backreference to 4th parenthesized subexpression \5 Backreference to 5th parenthesized subexpression \6 Backreference to 6th parenthesized subexpression \7 Backreference to 7th parenthesized subexpression \8 Backreference to 8th parenthesized subexpression \9 Backreference to 9th parenthesized subexpression

    All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a '?'. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don't currently support reluctancy.

    RE runs programs compiled by the RECompiler class. But the RE matcher class does not include the actual regular expression compiler for reasons of efficiency. In fact, if you want to pre-compile one or more regular expressions, the 'recompile' class can be invoked from the command line to produce compiled output like this:

     // Pre-compiled regular expression "a*b" char[] re1Instructions = { 0x007c, 0x0000, 0x001a, 0x007c, 0x0000, 0x000d, 0x0041, 0x0001, 0x0004, 0x0061, 0x007c, 0x0000, 0x0003, 0x0047, 0x0000, 0xfff6, 0x007c, 0x0000, 0x0003, 0x004e, 0x0000, 0x0003, 0x0041, 0x0001, 0x0004, 0x0062, 0x0045, 0x0000, 0x0000, }; 
    REProgram re1 = new REProgram(re1Instructions);
    You can then construct a regular expression matcher (RE) object from the pre-compiled expression re1 and thus avoid the overhead of compiling the expression at runtime. If you require more dynamic regular expressions, you can construct a single RECompiler object and re-use it to compile each expression. Similarly, you can change the program run by a given matcher object at any time. However, RE and RECompiler are not threadsafe (for efficiency reasons, and because requiring thread safety in this class is deemed to be a rare requirement), so you will need to construct a separate compiler or matcher object for each thread (unless you do thread synchronization yourself).


    ISSUES:

    @see recompile @see RECompiler @author Jonathan Locke @version $Id: RE.java,v 1.1 2000/04/27 01:22:33 jon Exp $
  • org.jostraca.comp.gnu.regexp.RE
    RE provides the user interface for compiling and matching regular expressions.

    A regular expression object (class RE) is compiled by constructing it from a String, StringBuffer or character array, with optional compilation flags (below) and an optional syntax specification (see RESyntax; if not specified, RESyntax.RE_SYNTAX_PERL5 is used).

    Various methods attempt to match input text against a compiled regular expression. These methods are:

  • isMatch: returns true if the input text in its entirety matches the regular expression pattern.
  • getMatch: returns the first match found in the input text, or null if no match is found.
  • getAllMatches: returns an array of all non-overlapping matches found in the input text. If no matches are found, the array is zero-length.
  • substitute: substitute the first occurence of the pattern in the input text with a replacement string (which may include metacharacters $0-$9, see REMatch.substituteInto).
  • substituteAll: same as above, but repeat for each match before returning.
  • getMatchEnumeration: returns an REMatchEnumeration object that allows iteration over the matches (see REMatchEnumeration for some reasons why you may want to do this instead of using getAllMatches.

    These methods all have similar argument lists. The input can be a String, a character array, a StringBuffer, a Reader or an InputStream of some sort. Note that when using a Reader or InputStream, the stream read position cannot be guaranteed after attempting a match (this is not a bug, but a consequence of the way regular expressions work). Using an REMatchEnumeration can eliminate most positioning problems.

    The optional index argument specifies the offset from the beginning of the text at which the search should start (see the descriptions of some of the execution flags for how this can affect positional pattern operators). For a Reader or InputStream, this means an offset from the current read position, so subsequent calls with the same index argument on a Reader or an InputStream will not necessarily be accessing the same position on the stream, whereas repeated searches at a given index in a fixed string will return consistent results.

    You can optionally affect the execution environment by using a combination of execution flags (constants listed below).

    All operations on a regular expression are performed in a thread-safe manner. @author Wes Biggs @version 1.1.3, 18 June 2001

  • org.renjin.primitives.text.regex.RE
    Compiled regular expression.

  • Examples of org.apache.regexp.RE

       * @param rawDocument Das zu durchsuchende Dokument.
       * @throws RegainException Wenn das Dokument nicht gelesen werden konnte.
       */
      private void parseHtmlDocument(RawDocument rawDocument) throws RegainException {
        for (int i = 0; i < mHtmlParserPatternReArr.length; i++) {
          RE re = mHtmlParserPatternReArr[i];
          int urlGroup = mHtmlParserUrlPatternArr[i].getRegexUrlGroup();
          boolean shouldBeParsed = mHtmlParserUrlPatternArr[i].getShouldBeParsed();
          boolean shouldBeIndexed = mHtmlParserUrlPatternArr[i].getShouldBeIndexed();

          int offset = 0;
          String contentAsString = rawDocument.getContentAsString();
          try {
            while (re.match(contentAsString, offset)) {
              offset = re.getParenEnd(0);

              String parentUrl = rawDocument.getUrl();
              String url = re.getParen(urlGroup);

              if (url != null) {
                // Convert the URL to an absolute URL
                url = CrawlerToolkit.toAbsoluteUrl(url, parentUrl);

    View Full Code Here

    Examples of org.apache.regexp.RE

          return null;
        }

        String regex = "\\." + extention + "$";
        try {
          return new RE(regex, RE.MATCH_CASEINDEPENDENT);
        } catch (RESyntaxException exc) {
          throw new RegainException("Creating accept regex for preparator failed: "
                  + regex, exc);
        }
      }
    View Full Code Here

    Examples of org.apache.regexp.RE

        }
        buffer.append(")$");

        String urlRegex = buffer.toString();
        try {
          return new RE(urlRegex, RE.MATCH_CASEINDEPENDENT);
        } catch (RESyntaxException exc) {
          throw new RegainException("Creating accept regex for preparator failed: "
                  + urlRegex, exc);
        }
      }
    View Full Code Here

    Examples of org.apache.regexp.RE

          return null;
        }

        if (mValueRegex == null) {
          try {
            mValueRegex = new RE("^\\s+(.*)\\s+REG_SZ\\s+(.*)$");
          } catch (RESyntaxException exc) {
            throw new RegainException("Creating registry value regex failed", exc);
          }
        }

    View Full Code Here

    Examples of org.apache.regexp.RE

          }

          String openInNewWindowRegex = indexConfigs[0].getOpenInNewWindowRegex();
          if (openInNewWindowRegex != null) {
            try {
              mOpenInNewWindowRegex = new RE(openInNewWindowRegex);
            } catch (RESyntaxException exc) {
              throw new RegainException("Syntax error in openInNewWindowRegex: '" + openInNewWindowRegex + "'", exc);
            }
          }
    View Full Code Here

    Examples of org.apache.regexp.RE

       {
         Hashtable myParameters = new Hashtable();

    try {
      String contentType = this.myRequest.getContentType();
      RE r = new RE("multipart/form-data");
      if ( r.match (" " + contentType) ) {
        // We are dealing with a multipart form
        MultipartRequest formHandler = new MultipartRequest
              (this.myRequest, tmpDir);
        Enumeration paramList = formHandler.getParameterNames();
        for (; paramList.hasMoreElements() ;) {
    View Full Code Here

    Examples of org.apache.regexp.RE

        throws RegainException
      {
        super(prefix, pathStartRegex, pathEndRegex);

        try {
          mPathNodeRE = new RE(pathNodeRegex, RE.MATCH_CASEINDEPENDENT);
        }
        catch (RESyntaxException exc) {
          throw new RegainException("Syntax error in regular expression", exc);
        }
    View Full Code Here

    Examples of org.apache.regexp.RE

      {
        super(prefix, contentStartRegex, contentEndRegex);

        try {
          if ((headlineRegex != null) && (headlineRegex.length() != 0)) {
            mHeadlineRE = new RE(headlineRegex, RE.MATCH_CASEINDEPENDENT | RE.MATCH_MULTILINE);
            mHeadlineRegexGroup = headlineRegexGroup;
          }
        }
        catch (RESyntaxException exc) {
          throw new RegainException("Syntax error in regular expression", exc);
    View Full Code Here

    Examples of org.apache.regexp.RE

      {
        mPrefix = prefix;

        try {
          if ((fragmentStartRegex != null) && (fragmentStartRegex.length() != 0)) {
            mFragmentStartRE = new RE(fragmentStartRegex, RE.MATCH_CASEINDEPENDENT);
            mFragmentStartRegex = fragmentStartRegex;
          }
          if ((fragmentEndRegex != null) && (fragmentEndRegex.length() != 0)) {
            mFragmentEndRE = new RE(fragmentEndRegex, RE.MATCH_CASEINDEPENDENT);
            mFragmentEndRegex = fragmentEndRegex;
          }
        }
        catch (RESyntaxException exc) {
          throw new RegainException("Syntax error in regular expression", exc);
    View Full Code Here

    Examples of org.apache.regexp.RE

            throw new RegainException("Error in ExternalPreparator config: No " +
                    "commandLine defined in command section #" + (i + 1));
          }

          try {
            mUrlRegexArr[i] = new RE(urlPattern);
          }
          catch (RESyntaxException exc) {
            throw new RegainException("Error in ExternalPreparator config: " +
                    "urlPattern has wrong syntax: " + urlPattern, exc);
          }
    View Full Code Here
    TOP
    Copyright © 2018 www.massapi.com. All rights reserved.
    All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.