Examples of edu.ucla.sspace.dri.DependencyRandomIndexing

edu.ucla.sspace.dri.DependencyRandomIndexing
A co-occurrence based approach to statistical semantics that uses dependency parse trees and approximates a full co-occurrence matrix by using a randomized projection. This implementation is an extension of {@link edu.ucla.sspace.ri.RandomIndexing}, which is based on three papers:
- M. Sahlgren, "Vector-based semantic analysis: Representing word meanings based on random labels," in Proceedings of the ESSLLI 2001 Workshop on Semantic Knowledge Acquisition and Categorisation, Helsinki, Finland, 2001.
- M. Sahlgren, "An introduction to random indexing," in Proceedings of the Methods and Applicatons of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, 2005.
- M. Sahlgren, A. Holst, and P. Kanerva, "Permutations as a means to encode order in word space," in Proceedings of the 30th Annual Meeting of the Cognitive Science Society (CogSci’08), 2008.
The technique for incorprating dependnecy parse trees is based on the following paper:
S Pado and M. Lapata, "Dependency-Based Construction of Semantic Space Models," in Association for Computational Linguistics, 2007

Dependency Random Indexing (DRI) extends Random Indexing by restricting a word's context to be set of words with which it has a syntactic relationship. Full word co-occurrence models have shown that this restricted interpretation of a context can improve the semantic representations. DRI uses the same approximation technique as Random Indexing to project this full co-occurrence space into a significantly smaller dimensional space. This projection is done through use of index vectors, each of which are sparse and mostly orthogonal to all other index vectors. The summation of a word's index vectors corresponds directly to that word's occurrence in a context.

While Random Indexing uses permutations of these index vectors to encode lexical position, a shallow form of syntactic structure, DRI extends the notion of permutations to allow for the encoding of dependency relationships. Through this modification, the set of relationships between any two co-occurirng words in a sentence can be encoded, as can the distance between the two words. Under this model, each possible dependency relationship could have it's own permutation function, as could each possible distance between co-occurring words.

Property: {@value #DEPENDENCY_ACCEPTOR_PROPERTY} Default: {@link UniversalRelationAcceptor}: This property sets {@link DependencyRelationAcceptor} to use for validating dependency paths. If apath is rejected it will not influence either the lemma vector or the selectional preference vectors.
Property: {@value #DEPENDENCY_PATH_LENGTH_PROPERTY} Default: {@value DEFAULT_DEPENDENCY_PATH_LENGTH}: This property sets the maximal length a dependency path can be for it to be accepted. Paths beyond this length will not contribute towards either the lemma vectors or selectional preference vectors.
Property: {@value #VECTOR_LENGTH_PROPERTY} Default: {@link DEFAULT_VECTOR_LENGTH}: This property sets the number of dimensions in the word space.

other


        // Ensure that the configured DependencyExtactor is in place prior to
        // constructing the DRI instance
        setupDependencyExtractor();     
   
        dri = new DependencyRandomIndexing(permFunction,System.getProperties());
        if (argOptions.hasOption("loadIndexes")) {
            String savedIndexName = argOptions.getStringOption("loadIndexes");
            dri.setWordToVectorMap((GeneratorMap<TernaryVector>)
                SerializableUtil.load(new File(savedIndexName + ".index"),
                                      GeneratorMap.class));

Examples of edu.ucla.sspace.dri.DependencyRandomIndexing

Related Classes of edu.ucla.sspace.dri.DependencyRandomIndexing