The VSM first processes documents into a word-document matrix where each unique word is a assigned a row in the matrix, and each column represents a document. The values of ths matrix correspond to the number of times the row's word occurs in the column's document. Optionally, after the matrix has been completely, its values may be transformed. This is frequently done using the {@link edu.ucla.sspace.matrix.TfIdfTransform Tf-Idf Transform}.
This class offers one configurable parameter.
{@value #MATRIX_TRANSFORM_PROPERTY}
This class is thread-safe for concurrent calls of {@link #processDocument(BufferedReader) processDocument}. Once {@link #processSpace(Properties) processSpace} has been called, no further calls to{@code processDocument} should be made. This implementation does not supportaccess to the semantic vectors until after {@code processSpace} has beencalled. @see Transform @author David Jurgens
|
|