A class for tree normalization. The default one does no normalization. Other tree normalizers will change various node labels, or perhaps the whole tree geometry (by doing such things as deleting functional tags or empty elements). Another operation that a
TreeNormalizer
may wish to perform is interning the
String
s passed to it. Can be reused as a Singleton. Designed to be extended.
The
TreeNormalizer
methods are in two groups. The contract for this class is that first normalizeTerminal or normalizeNonterminal will be called on each
String
that will be put into a
Tree
, when they are read from files or otherwise created. Then
normalizeWholeTree
will be called on the
Tree
. It normally walks the
Tree
making whatever modifications it wishes to. A
TreeNormalizer
need not make a deep copy of a
Tree
. It is assumed to be able to work destructively, because afterwards we will only use the normalized
Tree
.
Implementation note: This is a very old legacy class used in conjunction with PennTreeReader. It seems now that it would be better to move the String normalization into the tokenizer, and then we are just left with a (possibly destructive) TreeTransformer.
@author Christopher Manning