Examples of Fetcher

cn.edu.hfut.dmic.webcollector.fetcher.Fetcher
抓取器 @author hu
com.volantis.xml.pipeline.sax.drivers.uri.Fetcher
Fetches content from a URL and inserts it into the pipeline.
net.azib.ipscan.fetchers.Fetcher
Interface of all IP Fetchers. Fetcher is responsible for gathering a certain type of information about the provided scanning subject (in GUI terms, Fetcher is a column in the results list). Fetchers do the actual information fetching about each scanned IP address. Instances of this classes are shared among all the threads, so implementations must be thread safe and stateless. @author Anton Keks
org.apache.hadoop.mapred.CoronaJTState.Fetcher
org.apache.nutch.fetcher.Fetcher
A queue-based fetcher.
This fetcher uses a well-known model of one producer (a QueueFeeder) and many consumers (FetcherThread-s).
QueueFeeder reads input fetchlists and populates a set of FetchItemQueue-s, which hold FetchItem-s that describe the items to be fetched. There are as many queues as there are unique hosts, but at any given time the total number of fetch items in all queues is less than a fixed number (currently set to a multiple of the number of threads).
As items are consumed from the queues, the QueueFeeder continues to add new input items, so that their total count stays fixed (FetcherThread-s may also add new items to the queues e.g. as a results of redirection) - until all input items are exhausted, at which point the number of items in the queues begins to decrease. When this number reaches 0 fetcher will finish.
This fetcher implementation handles per-host blocking itself, instead of delegating this work to protocol-specific plugins. Each per-host queue handles its own "politeness" settings, such as the maximum number of concurrent requests and crawl delay between consecutive requests - and also a list of requests in progress, and the time the last request was finished. As FetcherThread-s ask for new items to be fetched, queues may return eligible items or null if for "politeness" reasons this host's queue is not yet ready.
If there are still unfetched items in the queues, but none of the items are ready, FetcherThread-s will spin-wait until either some items become available, or a timeout is reached (at which point the Fetcher will abort, assuming the task is hung). @author Andrzej Bialecki
org.apache.tez.runtime.library.shuffle.common.Fetcher
Responsible for fetching inputs served by the ShuffleHandler for a single host. Construct using {@link FetcherBuilder}
org.fenixedu.academic.ui.struts.action.teacher.siteArchive.Fetcher
The Fetcher manages a queue of {@link org.fenixedu.academic.ui.struts.action.teacher.siteArchive.Resource} and it'sresponsible for retrieving and transforming each resource in the queue.
Each resource is retrieved by creating a new RequestDispatcher to the resource's url and by forwarding the request to that dispatcher. The current request and response are wrapped to avoid unwanted secondary effects and to allow the called to generate it's own content to the user.
If the resource is an HTML page then url's present in the page are transformed using the resource's rules. @author cfgi
org.stringtree.Fetcher

Examples of org.apache.nutch.fetcher.Fetcher

      
    for (int i = 0; i < depth; i++) {             // generate new segment
      Path segment =
        new Generator(job).generate(crawlDb, segments, -1,
                                     topN, System.currentTimeMillis());
      new Fetcher(job).fetch(segment, threads, Fetcher.isParsing(job));  // fetch it
      if (!Fetcher.isParsing(job)) {
        new ParseSegment(job).parse(segment);    // parse it, if needed
      }
      new CrawlDb(job).update(crawlDb, segment); // update crawldb
    }

View Full Code Here

Examples of org.apache.nutch.fetcher.Fetcher

    Path index = new Path(dir + "/index");


    Path tmpDir = job.getLocalPath("crawl"+Path.SEPARATOR+getDate());
    Injector injector = new Injector(getConf());
    Generator generator = new Generator(getConf());
    Fetcher fetcher = new Fetcher(getConf());
    ParseSegment parseSegment = new ParseSegment(getConf());
    CrawlDb crawlDbTool = new CrawlDb(getConf());
    LinkDb linkDbTool = new LinkDb(getConf());
      
    // initialize crawlDb
    injector.inject(crawlDb, rootUrlDir);
    int i;
    for (i = 0; i < depth; i++) {             // generate new segment
      Path[] segs = generator.generate(crawlDb, segments, -1, topN, System
          .currentTimeMillis());
      if (segs == null) {
        LOG.info("Stopping at depth=" + i + " - no more URLs to fetch.");
        break;
      }
      fetcher.fetch(segs[0], threads, org.apache.nutch.fetcher.Fetcher.isParsing(getConf()));  // fetch it
      if (!Fetcher.isParsing(job)) {
        parseSegment.parse(segs[0]);    // parse it, if needed
      }
      crawlDbTool.update(crawlDb, segs, true, true); // update crawldb
    }

View Full Code Here

Examples of org.apache.nutch.fetcher.Fetcher

    Path linkDb = new Path(dir + "/linkdb");
    Path segments = new Path(dir + "/segments");
    res.elapsed = System.currentTimeMillis();
    Injector injector = new Injector(getConf());
    Generator generator = new Generator(getConf());
    Fetcher fetcher = new Fetcher(getConf());
    ParseSegment parseSegment = new ParseSegment(getConf());
    CrawlDb crawlDbTool = new CrawlDb(getConf());
    LinkDb linkDbTool = new LinkDb(getConf());
      
    // initialize crawlDb
    long start = System.currentTimeMillis();
    injector.inject(crawlDb, rootUrlDir);
    long delta = System.currentTimeMillis() - start;
    res.addTiming("inject", "0", delta);
    int i;
    for (i = 0; i < depth; i++) {             // generate new segment
      start = System.currentTimeMillis();
      Path[] segs = generator.generate(crawlDb, segments, -1, topN, System
          .currentTimeMillis());
      delta = System.currentTimeMillis() - start;
      res.addTiming("generate", i + "", delta);
      if (segs == null) {
        LOG.info("Stopping at depth=" + i + " - no more URLs to fetch.");
        break;
      }
      start = System.currentTimeMillis();
      fetcher.fetch(segs[0], threads, org.apache.nutch.fetcher.Fetcher.isParsing(getConf()));  // fetch it
      delta = System.currentTimeMillis() - start;
      res.addTiming("fetch", i + "", delta);
      if (!Fetcher.isParsing(job)) {
        start = System.currentTimeMillis();
        parseSegment.parse(segs[0]);    // parse it, if needed

View Full Code Here

Examples of org.apache.nutch.fetcher.Fetcher

    Path linkDb = new Path(dir + "/linkdb");
    Path segments = new Path(dir + "/segments");
    res.elapsed = System.currentTimeMillis();
    Injector injector = new Injector(getConf());
    Generator generator = new Generator(getConf());
    Fetcher fetcher = new Fetcher(getConf());
    ParseSegment parseSegment = new ParseSegment(getConf());
    CrawlDb crawlDbTool = new CrawlDb(getConf());
    LinkDb linkDbTool = new LinkDb(getConf());
      
    // initialize crawlDb
    long start = System.currentTimeMillis();
    injector.inject(crawlDb, rootUrlDir);
    long delta = System.currentTimeMillis() - start;
    res.addTiming("inject", "0", delta);
    int i;
    for (i = 0; i < depth; i++) {             // generate new segment
      start = System.currentTimeMillis();
      Path[] segs = generator.generate(crawlDb, segments, -1, topN, System
          .currentTimeMillis());
      delta = System.currentTimeMillis() - start;
      res.addTiming("generate", i + "", delta);
      if (segs == null) {
        LOG.info("Stopping at depth=" + i + " - no more URLs to fetch.");
        break;
      }
      start = System.currentTimeMillis();
      fetcher.fetch(segs[0], threads);  // fetch it
      delta = System.currentTimeMillis() - start;
      res.addTiming("fetch", i + "", delta);
      if (!Fetcher.isParsing(job)) {
        start = System.currentTimeMillis();
        parseSegment.parse(segs[0]);    // parse it, if needed

View Full Code Here

Examples of org.apache.nutch.fetcher.Fetcher

    Path index = new Path(dir + "/index");


    Path tmpDir = job.getLocalPath("crawl"+Path.SEPARATOR+getDate());
    Injector injector = new Injector(getConf());
    Generator generator = new Generator(getConf());
    Fetcher fetcher = new Fetcher(getConf());
    ParseSegment parseSegment = new ParseSegment(getConf());
    CrawlDb crawlDbTool = new CrawlDb(getConf());
    LinkDb linkDbTool = new LinkDb(getConf());
      
    // initialize crawlDb
    injector.inject(crawlDb, rootUrlDir);
    int i;
    for (i = 0; i < depth; i++) {             // generate new segment
      Path[] segs = generator.generate(crawlDb, segments, -1, topN, System
          .currentTimeMillis());
      if (segs == null) {
        LOG.info("Stopping at depth=" + i + " - no more URLs to fetch.");
        break;
      }
      fetcher.fetch(segs[0], threads);  // fetch it
      if (!Fetcher.isParsing(job)) {
        parseSegment.parse(segs[0]);    // parse it, if needed
      }
      crawlDbTool.update(crawlDb, segs, true, true); // update crawldb
    }

View Full Code Here

Examples of org.apache.nutch.fetcher.Fetcher

    Path index = new Path(dir + "/index");


    Path tmpDir = job.getLocalPath("crawl"+Path.SEPARATOR+getDate());
    Injector injector = new Injector(getConf());
    Generator generator = new Generator(getConf());
    Fetcher fetcher = new Fetcher(getConf());
    ParseSegment parseSegment = new ParseSegment(getConf());
    CrawlDb crawlDbTool = new CrawlDb(getConf());
    LinkDb linkDbTool = new LinkDb(getConf());
      
    // initialize crawlDb
    injector.inject(crawlDb, rootUrlDir);
    int i;
    for (i = 0; i < depth; i++) {             // generate new segment
      Path[] segs = generator.generate(crawlDb, segments, -1, topN, System
          .currentTimeMillis());
      if (segs == null) {
        LOG.info("Stopping at depth=" + i + " - no more URLs to fetch.");
        break;
      }
      fetcher.fetch(segs[0], threads);  // fetch it
      if (!Fetcher.isParsing(job)) {
        parseSegment.parse(segs[0]);    // parse it, if needed
      }
      crawlDbTool.update(crawlDb, segs, true, true); // update crawldb
    }

View Full Code Here

Examples of org.apache.nutch.fetcher.Fetcher

      
    for (int i = 0; i < depth; i++) {             // generate new segment
      Path segment =
        new Generator(job).generate(crawlDb, segments, -1,
                                     topN, System.currentTimeMillis());
      new Fetcher(job).fetch(segment, threads);  // fetch it
      if (!Fetcher.isParsing(job)) {
        new ParseSegment(job).parse(segment);    // parse it, if needed
      }
      new CrawlDb(job).update(crawlDb, segment); // update crawldb
    }

View Full Code Here

Examples of org.apache.tez.runtime.library.shuffle.common.Fetcher

              if (LOG.isDebugEnabled()) {
                LOG.debug("Processing pending host: " + inputHost.toDetailedString());
              }
              if (inputHost.getNumPendingInputs() > 0 && !isShutdown.get()) {
                LOG.info("Scheduling fetch for inputHost: " + inputHost.getIdentifier());
                Fetcher fetcher = constructFetcherForHost(inputHost, conf);
                runningFetchers.add(fetcher);
                if (isShutdown.get()) {
                  LOG.info("hasBeenShutdown, Breaking out of ShuffleScheduler Loop");
                }
                ListenableFuture<FetchResult> future = fetcherExecutor

View Full Code Here

Examples of org.apache.tez.runtime.library.shuffle.common.Fetcher

              if (LOG.isDebugEnabled()) {
                LOG.debug("Processing pending host: " + inputHost.toDetailedString());
              }
              if (inputHost.getNumPendingInputs() > 0) {
                LOG.info("Scheduling fetch for inputHost: " + inputHost.getHost());
                Fetcher fetcher = constructFetcherForHost(inputHost);
                numRunningFetchers.incrementAndGet();
                if (isShutdown.get()) {
                  LOG.info("hasBeenShutdown, Breaking out of BroadcastScheduler Loop");
                }
                ListenableFuture<FetchResult> future = fetcherExecutor

View Full Code Here

Examples of org.apache.tez.runtime.library.shuffle.common.Fetcher

              if (LOG.isDebugEnabled()) {
                LOG.debug("Processing pending host: " + inputHost.toDetailedString());
              }
              if (inputHost.getNumPendingInputs() > 0) {
                LOG.info("Scheduling fetch for inputHost: " + inputHost.getHost());
                Fetcher fetcher = constructFetcherForHost(inputHost);
                numRunningFetchers.incrementAndGet();
                ListenableFuture<FetchResult> future = fetcherExecutor
                    .submit(fetcher);
                Futures.addCallback(future, fetchFutureCallback);
                if (++count >= maxFetchersToRun) {

View Full Code Here

0 1 2 3 4

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.