Examples of CrawlDatum

cn.edu.hfut.dmic.webcollector.model.CrawlDatum
存储爬取任务的类，是WebCollector的核心类，记录了一个url的爬取信息，同样也可以作为一个爬取任务 @author hu
org.apache.nutch.crawl.CrawlDatum

Examples of org.apache.nutch.crawl.CrawlDatum

    Inlinks inlinks = new Inlinks();
    inlinks.add(new Inlink("http://test1.com/", "text1"));
    inlinks.add(new Inlink("http://test2.com/", "text2"));
    inlinks.add(new Inlink("http://test3.com/", "text2"));
    try {
      filter.filter(doc, parse, new Text("http://nutch.apache.org/index.html"), new CrawlDatum(), inlinks);
    } catch(Exception e){
      e.printStackTrace();
      Assert.fail(e.getMessage());
    }
    Assert.assertNotNull(doc);

View Full Code Here

Examples of org.apache.nutch.crawl.CrawlDatum

                }


                reporter.incrCounter("FetcherOutlinks", "outlinks_following", 1);


                // Create new FetchItem with depth incremented
                FetchItem fit = FetchItem.create(new Text(followUrl), new CrawlDatum(CrawlDatum.STATUS_LINKED, interval), queueMode, outlinkDepth + 1);
                fetchQueues.addFetchItem(fit);


                outlinkCounter++;
              }
            }

View Full Code Here

0 1 2 3 4 5

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.