The primary purpose of the Catalog is to associate resources in the document with local system identifiers. Some entities (document types, XML entities, and notations) have names and all of them can have either public or system identifiers or both. (In XML, only a notation can have a public identifier without a system identifier, but the methods implemented in this class obey the Catalog semantics from the SGML days when system identifiers were optional.)
The system identifiers returned by the resolution methods in this class are valid, i.e. usable by, and in fact constructed by, the java.net.URL class. Unfortunately, this class seems to behave in somewhat non-standard ways and the system identifiers returned may not be directly usable in a browser or filesystem context.
This class recognizes all of the Catalog entries defined in TR9401:1997:
Note that BASE entries are treated as described by RFC2396. In particular, this has the counter-intuitive property that after a BASE entry identifing "http://example.com/a/b/c" as the base URI, the relative URI "foo" is resolved to the absolute URI "http://example.com/a/b/foo". You must provide the trailing slash if you do not want the final component of the path to be discarded as a filename would in a URI for a resource: "http://example.com/a/b/c/".
Note that subordinate catalogs (all catalogs except the first, including CATALOG and DELEGATE* catalogs) are only loaded if and when they are required.
This class relies on classes which implement the CatalogReader interface to actually load catalog files. This allows the catalog semantics to be implemented for TR9401 text-based catalogs, XML catalogs, or any number of other storage formats.
Additional catalogs may also be loaded with the {@link #parseCatalog(String)} method.
Change Log:
Rewrite to use CatalogReaders.
Allow quoted components in xml.catalog.files so that URLs containing colons can be used on Unix. The string passed to xml.catalog.files can now have the form:
unquoted-path-with-no-sep-chars:"double-quoted path with or without sep chars":'single-quoted path with or without sep chars'
(Where ":" is the separater character in this example.)
If an unquoted path contains an embedded double or single quote character, no special processig is performed on that character. No path can contain separater characters, double, and single quotes simultaneously.
Fix bug in calculation of BASE entries: if a catalog contains multiple BASE entries, each is relative to the preceding base, not the default base URI of the catalog.
Fixed a bug in the calculation of the list of subordinate catalogs. This bug caused an infinite loop where parsing would alternately process two catalogs indefinitely.
Derived from public domain code originally published by Arbortext, Inc.
|
|
|
|
|
|
|
|