Examples of sunlabs.brazil.util.LexML

sunlabs.brazil.util.LexML
This class breaks angle-bracket-separated markup languages like SGML, XML, and HTML into tokens. It understands three types of tokens:

tags
Formally known as "entities", tags are delimited by "<" and ">". The first word in the tag is the tag name and the rest of the tag consists of the attributes, a set of "name=value" or "name" data. Spaces in tags are not significant except for quoted values in the attributes.
string
Plain strings that are not in angle-brackets. Spaces are significant and preserved.
comments
Delimited by "". All text between the delimiters is part of the comment. However, by convention, some comments actually contain data and so the methods that extract the fields from tags can be used to attempt to extract the fields from comments, too. Spaces are significant and preserved in a comment, unless the comment is treated as a tag, in which the tag rules apply.

This class is intended to parse markup languages, not to validate them. "Malformed" data is interpreted as graciously as possible, in order to extract as much information as possible. For instance: spaces are allowed between the "<" and the tag name, values in tags do not need to be quoted, and unbalanced quotes are accepted.
One type of "malformed" data specifically not handled is a quoted ">" character occurring within the body of a tag. Even if it is quoted, a ">" in the attributes of a tag will be interpreted as the end of the tag. For example, the single tag <img src='foo.jpg' alt='xyz > abc'> will be erroneously broken by this parser into two tokens:
- the tag <img src='foo.jpg' alt='xyz >
- the string "abc'>" (and possibly whatever text follows after).
Unfortunately, this type of "malformed" data is known to occur regularly.
This class also may not properly parse all well-formed XML tags, such as tags with extended paired delimiters <& and &>, <? and ?>, or <![CDATA[ and ]]>. Additionally, XML tags that have embedded comments containing the ">" character will not be parsed correctly (for example: <!DOCTYPE foo SYSTEM -- a > b -- foo.dtd>), since the ">" in the comment will be interpreted as the end of declaration tag, for the same reason mentioned above.
Note: this behavior may be changed on a per-application basis by overriding the findClose method in a subclass. @author Colin Stevens (colin.stevens@sun.com) @version 2.6

       hr.request.props, hr.prefix, init);
        } catch (IOException e) {
     hr.request.log(Server.LOG_WARNING, hr.prefix,
       "Can't find macro init file: " + init);
        }
        LexML lex = new LexML(src);
        while (lex.nextToken()) {
      if (lex.getType()==LexML.TAG &&
        lex.getTag().equals("definemacro")) {
    String name=lex.getAttributes().get("name");
    boolean doSubst = (lex.getAttributes().get("subst") != null);
    String value = snarfTillClose(lex, "definemacro").trim();
    if (doSubst) {
        value = Format.subst(hr.server.props, value);
    }
    if (!value.equals("")) {

View Full Code Here

  nodes = 0;
  root = new Node("ROOT", false, null, null, Node.ROOT, 0);// dummy root
        Stack stack = new Stack();    // parent stack
  stack.push(new StackInfo(root, 0));
  Node parent = root;
  LexML lex = new LexML(src);
  Node current = root;
  IllegalXmlException ex = null;


  while(lex.nextToken()) {
      switch (lex.getType()) {
      case LexML.COMMENT:
    // System.out.println("Got comment: " + lex.getBody());
    break;
      case LexML.STRING:
    current.appendCdata(lex.getBody());
    break;
      case LexML.TAG:
    String name = lex.getTag().toLowerCase();
    if (name.startsWith("/")) {  // pop stack if proper nesting
        name = name.substring(1);
        if (tags != null && !tags.containsKey(name)) {
      // System.out.println("Skipping /" + name);
      continue;
        }


        // parse error, what should we do?


        if (!name.equals(parent.getTag())) {
      int sl = line(lex.getString(),
        ((StackInfo)stack.peek()).position);
      ex = IllegalXmlException.getEx(ex,  ident, sl, 
          parent.getTag(), line(lex), "</" + name + ">");


      /*
       * if matching tag is on the stack, pop until we
       * get there.  Otherwise ignore the closing tag
       */


      for (int i=stack.size()-2;i>0;i--) {
          Node node=((StackInfo)stack.elementAt(i)).parent;
          String tag = node.getTag();
          if (tag.equals(name)) {
        while (++i <= stack.size()) {
            /*
            System.out.println("popping " +
              stack.peek());
            */
            stack.pop();
        }
              stack.pop();
              current=parent=((StackInfo)stack.peek()).parent;
              break;
          }
      }
      continue; // ignore it?
        } else {
            stack.pop();
      current = parent = ((StackInfo) stack.peek()).parent;
        }
    } else {
        boolean single = lex.isSingleton();
        if (!single && tags != null && !tags.containsKey(name)) {
      // System.out.println(name + ": setting to single");
      single=true;
        }
        int count = ((StackInfo) stack.peek()).getCount(name);
                    Node n = new Node(name, single,
          lex.getAttributes(), parent, Node.TAG, count);
        current = n;
        nodes++;
              parent.addChild(n);
        if (!single) {
            stack.push(new StackInfo(n, lex.getLocation()));
            parent = n;
        }
    }
    break;
      default:
    System.out.println("Oops, invalid type!");
    break;
      }
  }
  /*
   * if we still have stuff on the stack, add an error for each tag
   */


  stack.pop();
  while (stack.size() > 2) {
      stack.pop();
      parent = ((StackInfo) stack.peek()).parent;
      int sl = line(lex.getString(), ((StackInfo)stack.peek()).position);
      ex = IllegalXmlException.getEx(ex,  ident, sl, parent.getTag(),
        1+line(lex.getString(), lex.getLocation()), "eof");
  }
  if (ex != null) {
      throw ex;
  }
    }

View Full Code Here

TOP

Related Classes of sunlabs.brazil.util.LexML

sunlabs.brazil.sunlabs.XmlTree

sunlabs.brazil.template.MacroTemplate

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.