isWordStart
method, i.e.,where the word states appear at the end of the word in the linguist. Therefore, lattices should only be created from Result from the {@link edu.cmu.sphinx.linguist.lextree.LexTreeLinguist} and the {@link edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager}. Lattices can also be created from a collapsed {@link edu.cmu.sphinx.decoder.search.Token} tree and itsAlternativeHypothesisManager. This is what 'collapsed' means. Normally, between two word tokens is a series of tokens for other types of states, such as unit or HMM states. Using 'W' for word tokens, 'U' for unit tokens, 'H' for HMM tokens, a token chain can look like: W - U - H - H - H - H - U - H - H - H - H - WUsually, HMM tokens contains acoustic scores, and word tokens contains language scores. If we want to know the total acoustic and language scores between any two words, it is unnecessary to keep around the unit and HMM tokens. Therefore, all their acoustic and language scores are 'collapsed' into one token, so that it will look like:
W - P - Wwhere 'P' is a token that represents the path between the two words, and P contains the acoustic and language scores between the two words. It is this type of collapsed token tree that the Lattice class is expecting. Normally, the task of collapsing the token tree is done by the {@link edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager}. A collapsed token tree can look like:
"cat" - P - </s> / P / <s> - P - "a" - P - "big" \ P \ "dog" - P - </s>When a Lattice is constructed from a Result, the above collapsed token tree together with the alternate hypothesis of "all" instead of "a", will be converted into a Lattice that looks like the following:
"a" "cat" / \ / \ <s> "big" - </s> \ / \ / "all" "dog"Initially, a lattice can have redundant nodes, i.e., nodes referring to the same word and that originate from the same parent node. These nodes can be collapsed using the {@link LatticeOptimizer}.
|
|
|
|