This class represents a table mapping tuples of strings to integer values. It is used by {@link LayeredTokenPattern} for matching patterns against{@link LayeredSequence} objects.
The core of this class is a mapping from string tuples of length {@code n} tointegers {@code 0 <= i <} {@link Encoder#MAX_SIZE}. The mapping is defined by a list of {@code n} sets of String symbols {@code S_1, ..., S_n}, and a special symbol {@link Encoder#UNK}. The mapping assigns an integer value to each tuple {@code (x_1, ..., x_n)}, where {@code x_i} is either in{@code S_i} or is the symbol {@code UNK}. For example, if {@code n=2} and{@code} S_1=S_2 = 0,1}}, then a possible mapping would be {@code (0,0) => 0, (0,1) => 1,(0, UNK) => 2, (1,0) => 3, (1,1) => 4, (1,UNK) => 5, (UNK,0) => 6, (UNK,1) => 7, (UNK,UNK) => 8}.
Given a String tuple {@code (x_1, ..., x_n)}, it is mapped to an integer value as follows. First, it is mapped to an intermediate tuple {@code (y_1, ..., y_n)}, where {@code y_i = x_i} if {@code x_i} is in{@code S_i}, otherwise {@code y_i = UNK}. Then the value of {@code (y_1, ..., y_n)} according to the mapping is returned. This procedureis implemented in the method {@link Encoder#encode(String[])}, which represents tuples as String arrays.
There is no guarantee on the actual integer values assigned to each tuple. The mapping cannot be larger than 2^16. This means that the product {@code (|S_1|+1) * (|S_2|+1) * ... * (|S_n| + 1)} must be less than or equalto 2^16.
@author afader
|
|