Examples of org.modeshape.common.text.TokenStream

org.modeshape.common.text.TokenStream
e the columns empty to signal wildcard } else { // Read names until we see a ',' do { String columnName = tokens.consume(); if (tokens.canConsume("AS")) { String columnAlias = tokens.consume(); columns.add(new Column(columnName, columnAlias)); } else { columns.add(new Column(columnName, null)); } } while (tokens.canConsume(',')); } return columns; } protected Delete parseDelete( TokenStream tokens ) throws ParsingException { tokens.consume("DELETE", "FROM"); String tableName = tokens.consume(); tokens.consume("WHERE"); String lhs = tokens.consume(); tokens.consume('='); String rhs = tokens.consume(); return new Delete(tableName, new Criteria(lhs, rhs)); } } public abstract class Statement { ... } public class Query extends Statement { ... } public class Delete extends Statement { ... } public class Column { ... } This example shows an idiomatic way of writing a parser that is stateless and thread-safe. The parse(...) method takes the input as a parameter, and returns the domain-specific representation that resulted from the parsing. All other methods are utility methods that simply encapsulate common logic or make the code more readable.

In the example, the parse(...) first creates a TokenStream object (using a Tokenizer implementation that is not shown), and then loops as long as there are more tokens to read. As it loops, if the next token is "SELECT", the parser calls the parseSelect(...) method which immediately consumes a "SELECT" token, the names of the columns separated by commas (or a '*' if there all columns are to be selected), a "FROM" token, and the name of the table being queried. The parseSelect(...) method returns a Select object, which then added to the list of statements in the parse(...) method. The parser handles the "DELETE" statements in a similar manner.

Case sensitivity

Very often grammars to not require the case of keywords to match. This can make parsing a challenge, because all combinations of case need to be used. The TokenStream framework provides a very simple solution that requires no more effort than providing a boolean parameter to the constructor.

When a false value is provided for the the caseSensitive parameter, the TokenStream performs all matching operations as if each token's value were in uppercase only. This means that the arguments supplied to the match(...), canConsume(...), and consume(...) methods should be upper-cased. Note that the actual value of each token remains the actual case as it appears in the input.

Of course, when the TokenStream is created with a true value for the caseSensitive parameter, the matching is performed using the actual value as it appears in the input content

Whitespace

Many grammars are independent of lines breaks or whitespace, allowing a lot of flexibility when writing the content. The TokenStream framework makes it very easy to ignore line breaks and whitespace. To do so, the Tokenizer implementation must simply not include the line break character sequences and whitespace in the token ranges. Since none of the tokens contain whitespace, the parser never has to deal with them.

Of course, many parsers will require that some whitespace be included. For example, whitespace within a quoted string may be needed by the parser. In this case, the Tokenizer should simply include the whitespace characters in the tokens.

Writing a Tokenizer

Each parser will likely have its own {@link Tokenizer} implementation that contains the parser-specific logic about how tobreak the content into token objects. Generally, the easiest way to do this is to simply iterate through the character sequence passed into the {@link Tokenizer#tokenize(CharacterStream,Tokens) tokenize(...)} method, and use a switch statement to decidewhat to do.

Here is the code for a very basic Tokenizer implementation that ignores whitespace, line breaks and Java-style (multi-line and end-of-line) comments, while constructing single tokens for each quoted string.
```
 public class BasicTokenizer implements Tokenizer { public void tokenize( CharacterStream input, Tokens tokens ) throws ParsingException { while (input.hasNext()) { char c = input.next(); switch (c) { case ' ': case '\t': case '\n': case '\r': // Just skip these whitespace characters ... break; case '-': case '(': case ')': case '{': case '}': case '*': case ',': case ';': case '+': case '%': case '?': case '$': case '[': case ']': case '!': case '<': case '>': case '|': case '=': case ':': tokens.addToken(input.index(), input.index() + 1, SYMBOL); break; case '.': tokens.addToken(input.index(), input.index() + 1, DECIMAL); break; case '\"': case '\"': int startIndex = input.index(); Position startingPosition = input.position(); boolean foundClosingQuote = false; while (input.hasNext()) { c = input.next(); if (c == '\\' && input.isNext('"')) { c = input.next(); // consume the ' character since it is escaped } else if (c == '"') { foundClosingQuote = true; break; } } if (!foundClosingQuote) { throw new ParsingException(startingPosition, "No matching closing double quote found"); } int endIndex = input.index() + 1; // beyond last character read tokens.addToken(startIndex, endIndex, DOUBLE_QUOTED_STRING); break; case '\'': startIndex = input.index(); startingPosition = input.position(); foundClosingQuote = false; while (input.hasNext()) { c = input.next(); if (c == '\\' && input.isNext('\'')) { c = input.next(); // consume the ' character since it is escaped } else if (c == '\'') { foundClosingQuote = true; break; } } if (!foundClosingQuote) { throw new ParsingException(startingPosition, "No matching closing single quote found"); } endIndex = input.index() + 1; // beyond last character read tokens.addToken(startIndex, endIndex, SINGLE_QUOTED_STRING); break; case '/': startIndex = input.index(); if (input.isNext('/')) { // End-of-line comment ... boolean foundLineTerminator = false; while (input.hasNext()) { c = input.next(); if (c == '\n' || c == '\r') { foundLineTerminator = true; break; } } endIndex = input.index(); // the token won't include the '\n' or '\r' character(s) if (!foundLineTerminator) ++endIndex; // must point beyond last char if (c == '\r' && input.isNext('\n')) input.next(); if (useComments) { tokens.addToken(startIndex, endIndex, COMMENT); } } else if (input.isNext('*')) { // Multi-line comment ... while (input.hasNext() && !input.isNext('*', '/')) { c = input.next(); } if (input.hasNext()) input.next(); // consume the '*' if (input.hasNext()) input.next(); // consume the '/' if (useComments) { endIndex = input.index() + 1; // the token will include the '/' and '*' characters tokens.addToken(startIndex, endIndex, COMMENT); } } else { // just a regular slash ... tokens.addToken(startIndex, startIndex + 1, SYMBOL); } break; default: startIndex = input.index(); // Read until another whitespace/symbol/decimal/slash is found while (input.hasNext() && !(input.isNextWhitespace() || input.isNextAnyOf("/.-(){}*,;+%?$[]!<>|=:"))) { c = input.next(); } endIndex = input.index() + 1; // beyond last character that was included tokens.addToken(startIndex, endIndex, WORD); } } } } 
```
Tokenizers with exactly this behavior can actually be created using the {@link #basicTokenizer(boolean)} method. So while this verybasic implementation is not meant to be used in all situations, it may be useful in some situations.

     * @param content the content
     * @throws ParsingException if there is a problem parsing the content
     */
    protected void parse( String content ) {
        Tokenizer tokenizer = new CndTokenizer(false, true);
        TokenStream tokens = new TokenStream(content, tokenizer, false);
        tokens.start();
        while (tokens.hasNext()) {
            // Keep reading while we can recognize one of the two types of statements ...
            if (tokens.matches("<", ANY_VALUE, "=", ANY_VALUE, ">")) {
                parseNamespaceMapping(tokens);
            } else if (tokens.matches("[", ANY_VALUE, "]")) {
                parseNodeTypeDefinition(tokens);
            } else {
                Position position = tokens.previousPosition();
                throw new ParsingException(position, CndI18n.expectedNamespaceOrNodeDefinition.text(tokens.consume(),
                                                                                                    position.getLine(),
                                                                                                    position.getColumn()));
            }
        }
    }

View Full Code Here

        return new OrderBySpec(Order.DESCENDING, attributeName);
    }


    protected TokenStream tokenize( String xpath ) {
        Tokenizer tokenizer = new XPathParser.XPathTokenizer(false); // skip comments
        return new TokenStream(xpath, tokenizer, true).start(); // case sensitive!!
    }

View Full Code Here

    protected Path path( String path ) {
        return (Path)typeSystem.getTypeFactory(PropertyType.PATH.getName()).create(path);
    }


    protected TokenStream tokens( String content ) {
        return new TokenStream(content, new BasicSqlQueryParser.SqlTokenizer(false), false).start();
    }

View Full Code Here

    protected Path path( String path ) {
        return (Path)typeSystem.getTypeFactory(PropertyType.PATH.getName()).create(path);
    }


    protected TokenStream tokens( String content ) {
        return new TokenStream(content, new BasicSqlQueryParser.SqlTokenizer(false), false).start();
    }

View Full Code Here


    @Override
    public QueryCommand parseQuery( String query,
                                    TypeSystem typeSystem ) {
        Tokenizer tokenizer = new SqlTokenizer(false);
        TokenStream tokens = new TokenStream(query, tokenizer, false);
        tokens.start();
        return parseQueryCommand(tokens, typeSystem);
    }

View Full Code Here

     * @throws IllegalArgumentException if the expression is null
     */
    public Term parse( String fullTextSearchExpression ) {
        CheckArg.isNotNull(fullTextSearchExpression, "fullTextSearchExpression");
        Tokenizer tokenizer = new TermTokenizer();
        TokenStream stream = new TokenStream(fullTextSearchExpression, tokenizer, false);
        return parse(stream.start());
    }

View Full Code Here

        this.typeSystem = context;
    }


    public Component parseXPath( String xpath ) {
        Tokenizer tokenizer = new XPathTokenizer(false); // skip comments
        TokenStream tokens = new TokenStream(xpath, tokenizer, true).start(); // case sensitive!!
        return parseXPath(tokens);
    }

View Full Code Here


    @FixFor( "MODE-869" )
    @Test
    public void shouldParseStaticOperandWithSubqueryWithoutConsumingExtraTokens() {
        QueryCommand expected = parser.parseQuery(tokens("SELECT * FROM tableA"), typeSystem);
        TokenStream tokens = tokens("SELECT * FROM tableA)");
        StaticOperand operand = parser.parseStaticOperand(tokens, typeSystem);
        assertThat(operand, is(instanceOf(Subquery.class)));
        Subquery subquery = (Subquery)operand;
        assertThat(subquery.getQuery(), is(expected));
        assertThat(tokens.canConsume(')'), is(true));
    }

View Full Code Here

    protected DateTime date( String dateTime ) {
        return (DateTime)typeSystem.getDateTimeFactory().create(dateTime);
    }


    protected TokenStream tokens( String content ) {
        return new TokenStream(content, new BasicSqlQueryParser.SqlTokenizer(false), false).start();
    }

View Full Code Here

TOP

Related Classes of org.modeshape.common.text.TokenStream

org.modeshape.jcr.CndImporter

org.modeshape.jcr.query.parse.BasicSqlQueryParser

org.modeshape.jcr.query.parse.FullTextSearchParser

org.modeshape.jcr.query.parse.JcrSql2QueryParserTest

org.modeshape.jcr.query.parse.JcrSqlQueryParserTest

org.modeshape.jcr.query.parse.SqlQueryParserTest

org.modeshape.jcr.query.xpath.XPathParser

org.modeshape.jcr.query.xpath.XPathParserTest

java.util.NoSuchElementException

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.

Examples of org.modeshape.common.text.TokenStream

Case sensitivity

Whitespace

Writing a Tokenizer

Related Classes of org.modeshape.common.text.TokenStream