This class also provides for a slight variation on the basic model by differentiating co-occurrences on the basis of their relative position to the focus word. In such a case, for example, an occurrence of "red" two before the focus word would be represented by a different position than "red" one position before. This is reminiscent of the {@link edu.ucla.sspace.ri.RandomIndexing RandomIndexing} model with permutations.However, unlike Random Indexing, this model is not fixed in the number of dimensions it may use, with a possible {@code numWords * windowSize * 2}dimensions. Such a large number of dimensions can negatively impact the further operations on the semantic space's vectors, e.g., finding the most similar vectors for a word.
The dimensions of this space are annotated with a description of what they represent. In the basic model, this will be the co-occurring word. In the model that takes into account word order, the description will include the relative position of the word.
This class defines the following configurable properties that may be set using either the System properties or using the {@link GenericWordSpace#GenericWordSpace(Properties)} constructor.
{@value #WINDOW_SIZE_PROPERTY}
{@value #USE_WORD_ORDER_PROPERTY}
This class implements {@link Filterable}, which allows for fine-grained control of which semantics are retained. The {@link #setSemanticFilter(Set)}method can be used to speficy which words should have their semantics retained. Note that the words that are filtered out will still be used in computing the semantics of other words. This behavior is intended for use with a large corpora where retaining the semantics of all words in memory is infeasible.
This class is thread-safe for concurrent calls of {@link #processDocument(BufferedReader) processDocument}. At any given point in processing, the {@link #getVector(String) getVector} method may be usedto access the current semantics of a word. This allows callers to track incremental changes to the semantics as the corpus is processed.
The {@link #processSpace(Properties) processSpace} method does nothing forthis class and calls to it will not affect the results of {@code getVector}. @author David Jurgens
|
|