Represents a source HTML document.
The first step in parsing an HTML document is always to construct a Source
object from the source data, which can be a String
, Reader
, InputStream
, URLConnection
or URL
. Each constructor uses all the evidence available to determine the original {@linkplain #getEncoding() character encoding} of the data.
Once the Source
object has been created, you can immediately start searching for {@linkplain Tag tags} or {@linkplain Element elements} within the documentusing the tag search methods.
In certain circumstances you may be able to improve performance by calling the {@link #fullSequentialParse()} method before calling anytag search methods. See the documentation of the {@link #fullSequentialParse()} method for details.
Any issues encountered while parsing are logged to a {@link Logger} object.The {@link #setLogger(Logger)} method can be used to explicitly set a Logger
implementation for a particular Source
instance,otherwise the static {@link Config#LoggerProvider} property determines how the logger is set by default for all Source
instances.See the documentation of the {@link Config#LoggerProvider} property for information about how the default logging provider is determined.
Note that many of the useful functions which can be performed on the source document are defined in its superclass, {@link Segment}. The source object is itself a segment which spans the entire document.
Most of the methods defined in this class are useful for determining the elements and tags surrounding or neighbouring a particular character position in the document.
For information on how to create a modified version of this source document, see the {@link OutputDocument} class.
Source
objects are not thread safe, and should therefore not be shared between multiple threads unless all access is synchronized using some mechanism external to the library.
If memory usage is a major concern, consider using the {@link StreamedSource} class instead of the Source
class.
@see Segment
@see StreamedSource