Builders use a NodeFactory
object to construct each Node
object (Element
, Text
, Attribute
, etc.) they add to the tree. The default implementation simply calls the relevant constructor, stuffs the resulting Node
object in a length one Nodes
object, and returns it.
Subclassing this class allows builders to produce instance of subclasses (for example, HTMLElement
) instead of the base classes.
Subclasses can also filter content while building. For example, namespaces could be added to or changed on all elements. Comments could be deleted. Processing instructions can be changed into elements. An xinclude:include
element could be replaced with the content it references. All such changes must be consistent with the usual rules of well-formedness. For example, the makeDocType()
method should not return a list containing two DocType
objects because an XML document can have at most one document type declaration. Nor should it return a list containing an element, because an element cannot appear in a document prolog. However, it could return a list containing any number of comments and processing instructions, and not more than one DocType
object.
There is expected to be only one of these configured per database. @author Rick Hillegas
The factory is used when lexing to generate the nodes passed back to the caller. By implementing this interface, and setting that concrete object as the node factory for the {@link org.htmlparser.lexer.Lexer#setNodeFactory lexer} (perhaps via the{@link Parser#setNodeFactory parser}), the way that nodes are generated can be customized.
In general, replacing the factory with a custom factory is not required because of the flexibility of the {@link PrototypicalNodeFactory}.
Creation of Text and Remark nodes is straight forward, because essentially they are just sequences of characters extracted from the page. Creation of a Tag node requires that the attributes from the tag be remembered as well. @see PrototypicalNodeFactory
|
|
|
|