true
.Use one of the following methods to obtain the output:
The process removes all of the tags and {@linkplain CharacterReference#decodeCollapseWhiteSpace(CharSequence) decodes the result, collapsing all white space}. A space character is included in the output where a normal tag is present in the source, unless the tag belongs to an {@linkplain HTMLElements#getInlineLevelElementNames() inline-level} element.An exception to this is the {@link HTMLElementName#BR BR} element, which is also converted to a space despite being an inline-level element.
Text inside {@link HTMLElementName#SCRIPT SCRIPT} and {@link HTMLElementName#STYLE STYLE} elements contained within this segmentis ignored.
Setting the {@link #setExcludeNonHTMLElements(boolean) ExcludeNonHTMLElements} property results in the exclusion of any content within anon-HTML element.
See the {@link #excludeElement(StartTag)} method for details on how to implement a more complex mechanism to determine whether the{@linkplain Element#getContent() content} of each {@link Element} is to be excluded from the output.
All tags that are not normal tags, such as {@linkplain TagType#isServerTag() server tags}, {@linkplain StartTagType#COMMENT comments} etc., are removed from the output without adding white space to the output.
Note that segments on which the {@link Segment#ignoreWhenParsing()} method has been called are treated as text rather than markup,resulting in their inclusion in the output. To remove specific segments before extracting the text, create an {@link OutputDocument} and call its {@link OutputDocument#remove(Segment) remove(Segment)} or{@link OutputDocument#replaceWithSpaces(int,int) replaceWithSpaces(int begin, int end)} method for each segment to be removed.Then create a new source document using {@link Source#Source(CharSequence) new Source(outputDocument.toString())}and perform the text extraction on this new source object.
Extracting the text from an entire {@link Source} object performs a {@linkplain Source#fullSequentialParse() full sequential parse} automatically.
To perform a simple rendering of HTML markup into text, which is more readable than the output of this class, use the {@link Renderer} class instead.
<div><b>O</b>ne</div><div title="Two"><b>Th</b><script>//a script </script>ree</div>
"One Two Three
".
|
|