This is the generic abstract class for a Lexicon. In simplenlg V4, a
Lexicon is a collection of {@link simplenlg.framework.WordElement} objects; it does not do anymorphological processing (as was the case in simplenlg V3). Information about
WordElement can be obtained from a database ( {@link simplenlg.lexicon.NIHDBLexicon}) or from an XML file ( {@link simplenlg.lexicon.XMLLexicon}). Simplenlg V4 comes with a default (XML) lexicon, which is retrieved by the
getDefaultLexicon method. There are several ways of retrieving words. If in doubt, use
lookupWord. More control is available from the
getXXXX methods, which allow words to retrieved in several ways
- baseform and {@link simplenlg.framework.LexicalCategory}; for example "university" and
Noun - just baseform; for example, "university"
- ID string (if this is supported by the underlying DB or XML file); for example "E0063257" is the ID for "university" in the NIH Specialist lexicon
- variant; this looks for a word given a form of the word which may be inflected (eg, "universities") or a spelling variant (eg, "color" for "colour"). Acronyms are not considered to be variants (eg, "UK" and "United Kingdom" are regarded as different words).
Note: variant lookup is not guaranteed, this is a feature which hopefully will develop over time - variant and {@link simplenlg.framework.LexicalCategory}; for example "universities" and
Noun
For each type of lookup, there are three methods
-
getWords: get all matching {@link simplenlg.framework.WordElement} in the Lexicon. For example,getWords("dog") would return a List of two WordElement, one for the noun "dog" and one for the verb "dog". If there are no matching entries in the lexicon, this method returns an empty collection -
getWord: get a single matching {@link simplenlg.framework.WordElement} in the Lexicon. For example,getWord("dog") would a for either the noun "dog" or the verb "dog" (unpredictable). If there are no matching entries in the lexicon, this method will create a default WordElement based on the information specified. -
hasWord: returns true if the Lexicon contains at least one matching WordElement
@author Albert Gatt (simplenlg v3 lexicon)
@author Ehud Reiter (simplenlg v4 lexicon)