3.org/TR/CSS2/syndata.html#parsing-errors"> "Rules for handling parsing errors" but we also attempt to recover from single malformed tokens, by employing a variety of error recovery strategies.
In no case do we return a parse tree node for a construct that is not well-formed in isolation, and when parsing a group of items (a selector list or declaration group), we discard when not doing so would cause more styles to apply. E.g. in a, b##id, p { color: blue; color::red } we throw away the b##id instead of turning it into b since that would result in more styles being applied. We throw away color::red since it is not well-formed in isolation. We don't need to throw away anything else, since we assume properties are disjoint, and selectors in a comma list are disjoint. The result after discarding malformed content is a, p { color: blue }.
When we expect a token that is not there in tolerant mode, we report an error and proceed to the next token that signals the start of a similar chunk or the end of the containing block. Usually this means finding the next ; or }.
E.g. in {@code color red; background-color: blue} we expect a :after color but since none is forthcoming, we skip to the semicolon and start parsing the background color property.
Mismatched curly brackets are harder to deal with, so those are handled in the outer loops.
Recovery Strategies
We employ several recovery strategies when we encounter a parsing problem.
- Skipping the item in a list as in selector lists : {@code a.myclass, "borken", h6} → {@code a.myclass, h6}.
- Skipping a chunk inside a block as in property pairs : {@code color:red; background -color: blue} → {@code color: red}
- Skipping a chunk that may end with a block as in undefined symbols : @unknown { p { color: blue } }.
We define several generic syntactic constructs and group the actual CSS grammar into these.
- list item — a minimal run of tokens that does not include a comma, semicolon, curly bracket, or symbol.
- inner chunk — a minimal run of tokens that does not contain a symbol or a close curly bracket or semicolon outside a balanced curly bracket block.
- outer chunk — a run of tokens terminated by a curly bracket that closes a balanced block, a semicolon outside a balanced curly bracket block, or the end-of-file marker.
We then group the constructs defined in the CSS grammar into these new syntactic constructs.
- list item includes selectors and mediums. So a ruleset is then a list of list items followed by an outer chunk.
- inner chunks include property/expression pairs which are separated by semicolons. A declaration group is a list of inner chunks surrounded by curly brackets aka an outer chunk.
- outer chunks include most of the symbol based productions and the ruleset production. A stylesheet is a list of outer chunks.
Coding Conventions around error recovery
All ignored tokens not specified in CSS2.1 as ignorable must be reported. We only apply the error recovery strategies where we make a decision about how to proceed. So the functions that parse list items and parts of list items return {@code null} to indicate a tolerable failure, and the functionsthat parse lists of list items will examine their return values, and apply one of the recovery strategies. The recovery strategies are written to make sure that they enqueue a message if the malformed construct contained tokens.
All the parsing functions below should obey these conventions
- public parsing functions never return null.
- private parse* functions return null to indicate a tolerable failure to parse, or throw a {@link ParseException} to indicate anintolerable failure.
- When a parse* function delegates parsing to another function, one of the following is true: the delegator returns null when the delegate returns null, or the delegate reports its failure to parse and the delegator does not, or the delegate does not report failure to parse and the delegator does. This does not constrain the delegate from reporting messages about individual tokens -- only about ranges of skipped tokens.
Differences from CSS grammar
This class parses a few extensions to the CSS grammar.
- IE Filters and Transformations are parsed using the grammar below to a {@link CssTree.ProgId special node class}. These filters and transformations are documented on the MSDN though the grammar below is made up. {@code ProgId ::== 'progid' ':' ? ')' DottedFunctionName ::== // Includes an open parenthesis | '.' ProgIdAttributeList ::== | ',' ProgIdAttribute ::== '=' }See the test file "cssparseinput-filters.css" for examples.
- CSS frequently contains CSS hacks to make a style apparent to some user-agents and not others. These are represented using special node-types so that clients which only want to deal with standards-compliant CSS can filter out those nodes.
The star hack described in the wiki article is very widely used, and we handle it by adding a new type node type: {@link CssTree.UserAgentHack}which has a set of user agent IDs, and the node that would be visible to those user-agents. Clients that care about filters can transform the tree to remove inappropriate filters, or to transform the tree so that those filters will be visible in only the appropriate contexts using CSS.
@author mikesamuel@gmail.com