The document may be segmented.
We delay the actual parsing until it is actually necessary, so operations like getting the document URI will not require parsing.
48495051525354
* @param compression * true if the files are gzipped. */ public WARCDocumentCollection(String[] file, int bufferSize, Compression compression, File metadataFile) throws IOException { super(file, new HTMLDocumentFactory(), bufferSize, compression, metadataFile); }