The other enhancement provides support for the almost-hierarchical form used for files within archives, such as the JAR scheme, defined for the Java Platform in the documentation for {@link java.net.JarURLConnection}. By default, this support is enabled for absolute URIs with scheme equal to "jar", "zip", or "archive" (ignoring case), and is implemented by a hierarchical URI, whose authority includes the entire URI of the archive, up to and including the !
character. The URI of the archive must have no fragment. The whole archive URI must have no device and an absolute path. Special handling is supported for {@link #createURI(String) creating}, {@link #validArchiveAuthority validating}, {@link #devicePath getting the path}from, and {@link #toString() displaying} archive URIs. In all otheroperations, including {@link #resolve(URI) resolving} and {@link #deresolve(URI) deresolving}, they are handled like any ordinary URI. The schemes that identify archive URIs can be changed from their default by setting the org.eclipse.emf.common.util.URI.archiveSchemes
system property. Multiple schemes should be space separated, and the test of whether a URI's scheme matches is always case-insensitive. This implementation does not impose all of the restrictions on character validity that are specified in the RFC. Static methods whose names begin with "valid" are used to test whether a given string is valid value for the various URI components. Presently, these tests place no restrictions beyond what would have been required in order for {@link #createURI(String) createURI} to have parsed them correctly from a singleURI string. If necessary in the future, these tests may be made more strict, to better conform to the RFC.
Another group of static methods, whose names begin with "encode", use percent escaping to encode any characters that are not permitted in the various URI components. Another static method is provided to {@link #decode decode} encoded strings. An escaped character is represented asa percent symbol (%
), followed by two hex digits that specify the character code. These encoding methods are more strict than the validation methods described above. They ensure validity according to the RFC, with one exception: non-ASCII characters.
The RFC allows only characters that can be mapped to 7-bit US-ASCII representations. Non-ASCII, single-byte characters can be used only via percent escaping, as described above. This implementation uses Java's Unicode char
and String
representations, and makes no attempt to encode characters 0xA0 and above. Characters in the range 0x80-0x9F are still escaped. In this respect, EMF's notion of a URI is actually more like an IRI (Internationalized Resource Identifier), for which an RFC is now in draft form
.