Token.text
to a list of Strings (space delimited). Add each string as a feature to the token. If realValued
is true, then treat the position in the list as the feature name and the value as a double. Otherwise, the feature name is the string itself and the value is 1.0. Modified to allow feature names and values to be specified.eg: featureName1=featureValue1 featureName2=featureValue2 ... The name/value separator (here '=') can be specified.
If your data consists of feature/value pairs (eg height=10.7 width=3.6 length=1.7
), use new TokenSequenceParseFeatureString(true, true, "=")
. This format is typically used for sparse data, in which most features are equal to 0 in any given instance.
If your data consists only of values, and the position determines which feature the value is for (eg 10.7 3.6 1.7
), use new TokenSequenceParseFeatureString(true)
. This format is typically used for data that has a small number of features that all have non-zero values most of the time.
If your data is in the form of named binary indicator variables (eg yellow quacks has_webbed_feet
), use the constructor new TokenSequenceParseFeatureString(false)
. Each token will be interpreted as the name of a feature, whose value is 1.0.
@author Aron Culotta culotta@cs.umass.edu
|
|