Package edu.ucla.sspace.hal

Examples of edu.ucla.sspace.hal.HyperspaceAnalogueToLanguage

See here for additional papers that use HAL.

HAL is based on recording the co-occurrence of words in a sparse matrix. HAL also incorporates word order information by treating the co-occurrences of two words x y as being different than y x. Each word is assigned a unique index in the co-occurrence matrix. For some word x, when another word x co-occurs before, matrix entry x,y is update. Similarly, when y co-occurs after, the matrix entry y,x is updated. Therefore the full semantic vector for any words is its row vector concatenated with its column vector.

Typically, the full vectors are used (for an N x N matrix, these are 2*N in length). However, HAL also offers two posibilities for dimensionality reduction. Not all columns provide equal amount of information that can be used to distinguish the meanings of the words. Specifically, the information theoretic entropy of each column can be calculated as a way of ordering the columns by their importance. Using this ranking, either a fixed number of columns may be retained, or a threshold may be set to filter out low-entropy columns.

A {@link HyperspaceAnalogueToLanguage} model is defined by four parameters.The default constructor uses reasonable parameters that match those mentioned in the original publication. For alternate models, appropriate values must be passed in through the full constructor. The four parameters are:

Parameter: {@code windowSize}
Default: 5
This parameter sets size of the sliding co-occurrence window such that the {@code windowSize} words before and the {@code windowSize} words after the focus word will be used to count co-occurances.This model always uses symmetric windows.
Property: {@code weighting}
Default: {@link LinearWeighting}
This parameter sets the {@link WeightingFunction} used to weight co-occurrences between two words based onthe number of interleaving words, i.e. the distance between the two words in the sliding window. HAL traditionally uses a ramped, linear weighting where those words occurring closets receive more weight, with a linear decrease based on distance.
Property: {@code retainColumns}
Default: -1
If set to a positive value, this parameter enables dimensionality reduction by retaining only {@code retainColumns} columns. Columns will be ordered according to their entropy,and the {@code retainColumns} columns with the highest entropy will beretained. This parameter cannot be set in conjunction with {@code columnThreshold}
Property: {@code columnThreshold}
Default: -1
If set to a positive value, this parameter enables dimensionality reduction by retaining only those columns whose entropy is above the specified threshold. This parameter may not be set concurrently with {@code retainColumns}.

For models that require a non-symmetric window, a special {@link WeightingFunction} can be used which assigns a weight of {@code 0} toco-occurrences that match the non-symmetric window size. @author Alex Nau @author David Jurgens @see SemanticSpace @see WeightingFunction


            weighting = new LinearWeighting();
        int windowSize = argOptions.getIntOption('s', 5);
        double threshold = argOptions.getDoubleOption('h', -1d);
        int retain = argOptions.getIntOption('r', -1);

        return new HyperspaceAnalogueToLanguage(
                new StringBasisMapping(), windowSize, weighting,
                threshold, retain);
    }
View Full Code Here

TOP

Related Classes of edu.ucla.sspace.hal.HyperspaceAnalogueToLanguage

Copyright © 2018 www.massapicom. All rights reserved.
All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.