This class must be specified as the URL normalizer to be used in nutch-site.xml or nutch-default.xml. To do this specify the urlnormalizer.class property to have the value: org.apache.nutch.net.RegexUrlNormalizer. The urlnormalizer.regex.file property should also be set to the file name of an xml file which should contain the patterns and substitutions to be done on encountered URLs.
@author Luke Baker
|
|
|
|