Package de.jungblut.crawl.extraction.ArticleContentExtrator

Examples of de.jungblut.crawl.extraction.ArticleContentExtrator.ContentFetchResult


      html = StringEscapeUtils.unescapeHtml(html);
      final HashSet<String> outlinkSet = extractOutlinks(html, site);
      String title = extractTitle(html);

      String extractedLargestText = extractor.getText(html);
      return new ContentFetchResult(site, outlinkSet, title,
          extractedLargestText);
    } catch (ParserException pEx) {
      // ignore parser exceptions, they contain mostly garbage
    } catch (RuntimeException rEx) {
      rEx.printStackTrace();
View Full Code Here

TOP

Related Classes of de.jungblut.crawl.extraction.ArticleContentExtrator.ContentFetchResult

Copyright © 2018 www.massapicom. All rights reserved.
All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.