Examples of Robotstxt


Examples of de.anomic.crawler.RobotsTxt

        }.start();
        */

        // load the robots.txt db
        this.log.logConfig("Initializing robots.txt DB");
        this.robots = new RobotsTxt(this.tables.getHeap(WorkTables.TABLE_ROBOTS_NAME));
        this.log.logConfig("Loaded robots.txt DB: " this.robots.size() + " entries");

        // start a cache manager
        this.log.logConfig("Starting HT Cache Manager");

View Full Code Here

Examples of de.anomic.crawler.RobotsTxt

        }.start();
        */

        // load the robots.txt db
        this.log.logConfig("Initializing robots.txt DB");
        this.robots = new RobotsTxt(this.tables);
        this.log.logConfig("Loaded robots.txt DB: " this.robots.size() + " entries");

        // start a cache manager
        this.log.logConfig("Starting HT Cache Manager");

View Full Code Here

Examples of org.archive.modules.net.Robotstxt

                CrawlServer s = getServerCache().getServerFor(curi.getUURI());
                String ua = curi.getUserAgent();
                if (ua == null) {
                    ua = metadata.getUserAgent();
                }
                Robotstxt rep = s.getRobotstxt();
                if (rep != null) {
                    long crawlDelay = (long)(1000 * rep.getDirectivesFor(ua).getCrawlDelay());
                    crawlDelay =
                        (crawlDelay > respectThreshold)
                            ? respectThreshold
                            : crawlDelay;
                    if (crawlDelay > durationToWait) {
View Full Code Here
TOP
Copyright © 2018 www.massapi.com. All rights reserved.
All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.