This may be used to parallelize batch indexing. A large document collection can be broken into sub-collections. Each sub-collection can be indexed in parallel, on a different thread, process or machine. The complete index can then be created by merging sub-collection indexes with this method.
NOTE: the index in each {@link Directory} must not bechanged (opened by a writer) while this method is running. This method does not acquire a write lock in each input Directory, so it is up to the caller to enforce this.
This method is transactional in how Exceptions are handled: it does not commit a new segments_N file until all indexes are added. This means if an Exception occurs (for example disk full), then either no indexes will have been added or they all will have been.
Note that this requires temporary free space in the {@link Directory} up to 2X the sum of all input indexes(including the starting index). If readers/searchers are open against the starting index, then temporary free space required will be higher by the size of the starting index (see {@link #forceMerge(int)} for details).
NOTE: this method only copies the segments of the incomning indexes and does not merge them. Therefore deleted documents are not removed and the new segments are not merged with the existing ones. Also, if the merge policy allows compound files, then any segment that is not compound is converted to such. However, if the segment is compound, it is copied as-is even if the merge policy does not allow compound files.
This requires this index not be among those to be added.
NOTE: if this method hits an OutOfMemoryError you should immediately close the writer. See above for details. @throws CorruptIndexException if the index is corrupt @throws IOException if there is a low-level IO error
|
|
|
|
|
|
|
|
|
|