IndexWriter
creates and maintains an index. The create
argument to the {@link #IndexWriter(Directory,Analyzer,boolean,MaxFieldLength) constructor} determines whether a new index is created, or whether an existing index is opened. Note that you can open an index with create=true
even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open. There are also {@link #IndexWriter(Directory,Analyzer,MaxFieldLength) constructors}with no create
argument which will create a new index if there is not already an index at the provided path and otherwise open the existing index.
In either case, documents are added with {@link #addDocument(Document) addDocument} and removed with {@link #deleteDocuments(Term)} or {@link #deleteDocuments(Query)}. A document can be updated with {@link #updateDocument(Term,Document) updateDocument} (which just deletesand then adds the entire document). When finished adding, deleting and updating documents, {@link #close() close} should be called.
These changes are buffered in memory and periodically flushed to the {@link Directory} (during the above methodcalls). A flush is triggered when there are enough buffered deletes (see {@link #setMaxBufferedDeleteTerms}) or enough added documents since the last flush, whichever is sooner. For the added documents, flushing is triggered either by RAM usage of the documents (see {@link #setRAMBufferSizeMB}) or the number of added documents. The default is to flush when RAM usage hits 16 MB. For best indexing speed you should flush by RAM usage with a large RAM buffer. Note that flushing just moves the internal buffered state in IndexWriter into the index, but these changes are not visible to IndexReader until either {@link #commit()} or {@link #close} is called. A flush mayalso trigger one or more segment merges which by default run with a background thread so as not to block the addDocument calls (see below for changing the {@link MergeScheduler}).
If an index will not have more documents added for a while and optimal search performance is desired, then either the full {@link #optimize() optimize}method or partial {@link #optimize(int)} method should becalled before the index is closed.
Opening an IndexWriter
creates a lock file for the directory in use. Trying to open another IndexWriter
on the same directory will lead to a {@link LockObtainFailedException}. The {@link LockObtainFailedException}is also thrown if an IndexReader on the same directory is used to delete documents from the index.
Expert: IndexWriter
allows an optional {@link IndexDeletionPolicy} implementation to bespecified. You can use this to control when prior commits are deleted from the index. The default policy is {@link KeepOnlyLastCommitDeletionPolicy} which removes all priorcommits as soon as a new commit is done (this matches behavior before 2.2). Creating your own policy can allow you to explicitly keep previous "point in time" commits alive in the index for some time, to allow readers to refresh to the new commit without having the old commit deleted out from under them. This is necessary on filesystems like NFS that do not support "delete on last close" semantics, which Lucene's "point in time" search normally relies on.
Expert: IndexWriter
allows you to separately change the {@link MergePolicy} and the {@link MergeScheduler}. The {@link MergePolicy} is invoked whenever there arechanges to the segments in the index. Its role is to select which merges to do, if any, and return a {@link MergePolicy.MergeSpecification} describing the merges. Italso selects merges to do for optimize(). (The default is {@link LogByteSizeMergePolicy}. Then, the {@link MergeScheduler} is invoked with the requested merges andit decides when and how to run the merges. The default is {@link ConcurrentMergeScheduler}.
NOTE: if you hit an OutOfMemoryError then IndexWriter will quietly record this fact and block all future segment commits. This is a defensive measure in case any internal state (buffered documents and deletions) were corrupted. Any subsequent calls to {@link #commit()} will throw anIllegalStateException. The only course of action is to call {@link #close()}, which internally will call {@link #rollback()}, to undo any changes to the index since the last commit. You can also just call {@link #rollback()}directly.
NOTE: {@link IndexWriter
} instances are completely threadsafe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexWriter
instance as this may cause deadlock; use your own (non-Lucene) objects instead.
NOTE: If you call Thread.interrupt()
on a thread that's within IndexWriter, IndexWriter will try to catch this (eg, if it's in a wait() or Thread.sleep()), and will then throw the unchecked exception {@link ThreadInterruptedException}and clear the interrupt status on the thread.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|