DOMWriter
takes a DOM4J tree and outputs it as a W3C DOM object
namespaceURI
of a Node
is empty string, the serialization will treat them as null
, ignoring the prefix if any. should the remark on DOM Level 2 namespace URI included in the namespace algorithm in Core instead? DOMWriter
accepts any node type for serialization. For nodes of type Document
or Entity
, well formed XML will be created if possible. The serialized output for these node types is either as a Document or an External Entity, respectively, and is acceptable input for an XML parser. For all other types of nodes the serialized form is not specified, but should be something useful to a human for debugging or diagnostic purposes. Note: rigorously designing an external (source) form for stand-alone node types that don't already have one defined in [XML 1.0] seems a bit much to take on here.
Within a Document
, DocumentFragment
, or Entity
being serialized, Nodes
are processed as follows Document
nodes are written including with the XML declaration and a DTD subset, if one exists in the DOM. Writing a Document
node serializes the entire document. Entity
nodes, when written directly by DOMWriter.writeNode
, output the entity expansion but no namespace fixup is done. The resulting output will be valid as an external entity. EntityReference
nodes are serialized as an entity reference of the form "&entityName;
" in the output. Child nodes (the expansion) of the entity reference are ignored. CDATA sections containing content characters that can not be represented in the specified output encoding are handled according to the "split-cdata-sections" boolean parameter. If the boolean parameter is true
, CDATA sections are split, and the unrepresentable characters are serialized as numeric character references in ordinary content. The exact position and number of splits is not specified. If the boolean parameter is false
, unrepresentable characters in a CDATA section are reported as errors. The error is not recoverable - there is no mechanism for supplying alternative characters and continuing with the serialization. DocumentFragment
nodes are serialized by serializing the children of the document fragment in the order they appear in the document fragment. All other node types (Element, Text, etc.) are serialized to their corresponding XML source form. The serialization of a Node
does not always generate a well-formed XML document, i.e. a DOMBuilder
might through fatal errors when parsing the resulting serialization.
Within the character data of a document (outside of markup), any characters that cannot be represented directly are replaced with character references. Occurrences of '<' and '&' are replaced by the predefined entities < and &. The other predefined entities (>, &apos, and ") are not used; these characters can be included directly. Any character that can not be represented directly in the output character encoding is serialized as a numeric character reference.
Attributes not containing quotes are serialized in quotes. Attributes containing quotes but no apostrophes are serialized in apostrophes (single quotes). Attributes containing both forms of quotes are serialized in quotes, with quotes within the value represented by the predefined entity ". Any character that can not be represented directly in the output character encoding is serialized as a numeric character reference.
Within markup, but outside of attributes, any occurrence of a character that cannot be represented in the output character encoding is reported as an error. An example would be serializing the element <LaCa�ada/> with encoding="us-ascii"
.
When requested by setting the normalize-characters
boolean parameter on DOMWriter
, all data to be serialized, both markup and character data, is W3C Text normalized according to the rules defined in [CharModel]. The W3C Text normalization process affects only the data as it is being written; it does not alter the DOM's view of the document after serialization has completed.
Namespaces are fixed up during serialization, the serialization process will verify that namespace declarations, namespace prefixes and the namespace URIs associated with elements and attributes are consistent. If inconsistencies are found, the serialized form of the document will be altered to remove them. The method used for doing the namespace fixup while serializing a document is the algorithm defined in Appendix B.1 "Namespace normalization" of [DOM Level 3 Core]. previous paragraph to be defined closer here.
Any changes made affect only the namespace prefixes and declarations appearing in the serialized data. The DOM's view of the document is not altered by the serialization operation, and does not reflect any changes made to namespace declarations or prefixes in the serialized output.
While serializing a document the serializer will write out non-specified values (such as attributes whose specified
is false
) if the discard-default-content
boolean parameter is set to true
. If the discard-default-content
flag is set to false
and a schema is used for validation, the schema will be also used to determine if a value is specified or not. If no schema is used, the specified
flag on attribute nodes is used to determine if attribute values should be written out.
Ref to Core spec (1.1.9, XML namespaces, 5th paragraph) entity ref description about warning about unbound entity refs. Entity refs are always serialized as &foo;
, also mention this in the load part of this spec.
See also the Document Object Model (DOM) Level 3 Load and Save Specification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|