An implementation of the word sense induction algorithm described by Purandare and Pedersen. This implementation is based on the following paper:
- Amruta Purandare and Ted Pedersen. (2004) Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces. Proceedings of Conference on Computational Natural Language Learning (CoNLL), pp. 41-48, May 6-7, 2004, Boston, MA.
This class offers one configurable parameter.
- Property:
{@value #MAX_CONTEXT_PER_WORD}
Default: {@link Integer.MAX_VALUE} - This property sets the upper-bound on the maximum number of contexts to be clustered for a single word. If there are fewer of contexts than this value, all of the contexts will be used. Users should consider setting this value if a large corpus is to be used, or if the corpus contains many frequently used words after filtering.
@author David Jurgens