Package net.sf.regain.crawler.preparator.html

Examples of net.sf.regain.crawler.preparator.html.LinkVisitor


    // The result of parsing the html-content
    setCleanedContent(cleanedContent);

    // Extract links
    LinkVisitor linkVisitor = new LinkVisitor();
    if (isContentCutted) {
      // This means a new parser run which is expensive but neccessary
      htmlPage = new Page(rawDocument.getContentAsString(), "UTF-8");
      parser = new Parser(new Lexer(htmlPage));
    } else {
      parser.reset();
    }

    try {
      // Parse the content
      parser.visitAllNodesWith(linkVisitor);
      ArrayList<Tag> links = linkVisitor.getLinks();
      htmlPage.setBaseUrl(rawDocument.getUrl());

      // Iterate over all links found
      Iterator linksIter = links.iterator();
      while (linksIter.hasNext()) {
View Full Code Here

TOP

Related Classes of net.sf.regain.crawler.preparator.html.LinkVisitor

Copyright © 2018 www.massapicom. All rights reserved.
All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.