A forgiving HTML parser interface.
The forgiving HTML parser is useful for extracting information from the web since many sites have not-quite-standard HTML.
To parse a file into a DOM Document use
Document doc = new Html().parseDocument("foo.html");
To parse a string into a DOM Document use
String html = "<h1>small test</h1>"; Document doc = new Html().parseDocumentString(html);
To parse a file using the SAX API use
Html html = new Html(); html.setContentHandler(myContentHandler); html.parse("foo.html");