Decodes (unescapes) HTML entities with the complication that these are received one character at a time hence must be stored temporarily. Also, we may receive some "junk" characters before the actual entity which we will discard.
This class is designed to be 100% compatible with the corresponding logic in the C-version of the {@link com.google.security.streamhtmlparser.HtmlParser}, found in htmlparser.c. There are however a few intentional differences outlines below:
processChar returns the output {@code String} whereas in Java, we returna status code and then provide the {@code String} in a separatemethod getEntity. It is cleaner as it avoids the need to return empty {@code String}s during incomplete processing. Valid HTML entities have one of the following three forms:
ⅆ where dd is a number in decimal (base 10) form. &x|Xyy; where yy is a hex-number (base 16). &<html-entity>; where <html-entity> is one of lt, gt, amp, quot or apos. A reset method is provided to facilitate object re-use.
| |
| |