Decodes (unescapes) HTML entities with the complication that these are received one character at a time hence must be stored temporarily. Also, we may receive some "junk" characters before the actual entity which we will discard.
This class is designed to be 100% compatible with the corresponding logic in the C-version of the {@link com.google.security.streamhtmlparser.HtmlParser}, found in htmlparser.c
. There are however a few intentional differences outlines below:
processChar
returns the output {@code String} whereas in Java, we returna status code and then provide the {@code String} in a separatemethod getEntity
. It is cleaner as it avoids the need to return empty {@code String}s during incomplete processing. Valid HTML entities have one of the following three forms:
ⅆ
where dd is a number in decimal (base 10) form. &x|Xyy;
where yy is a hex-number (base 16). &<html-entity>;
where <html-entity>
is one of lt
, gt
, amp
, quot
or apos
. A reset
method is provided to facilitate object re-use.
|
|
|
|