Examples of de.jungblut.crawl.extraction.ArticleContentExtrator.ContentFetchResult

Package de.jungblut.crawl.extraction.ArticleContentExtrator

Examples of de.jungblut.crawl.extraction.ArticleContentExtrator.ContentFetchResult

de.jungblut.crawl.extraction.ArticleContentExtrator.ContentFetchResult

      html = StringEscapeUtils.unescapeHtml(html);
      final HashSet<String> outlinkSet = extractOutlinks(html, site);
      String title = extractTitle(html);


      String extractedLargestText = extractor.getText(html);
      return new ContentFetchResult(site, outlinkSet, title,
          extractedLargestText);
    } catch (ParserException pEx) {
      // ignore parser exceptions, they contain mostly garbage
    } catch (RuntimeException rEx) {
      rEx.printStackTrace();

View Full Code Here

TOP

Related Classes of de.jungblut.crawl.extraction.ArticleContentExtrator.ContentFetchResult

de.jungblut.crawl.extraction.ArticleContentExtrator

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.