This is the generic abstract class for a Lexicon. In simplenlg V4, a
Lexicon
is a collection of {@link simplenlg.framework.WordElement} objects; it does not do anymorphological processing (as was the case in simplenlg V3). Information about
WordElement
can be obtained from a database ( {@link simplenlg.lexicon.NIHDBLexicon}) or from an XML file ( {@link simplenlg.lexicon.XMLLexicon}). Simplenlg V4 comes with a default (XML) lexicon, which is retrieved by the
getDefaultLexicon
method. There are several ways of retrieving words. If in doubt, use
lookupWord
. More control is available from the
getXXXX
methods, which allow words to retrieved in several ways
- baseform and {@link simplenlg.framework.LexicalCategory}; for example "university" and
Noun
- just baseform; for example, "university"
- ID string (if this is supported by the underlying DB or XML file); for example "E0063257" is the ID for "university" in the NIH Specialist lexicon
- variant; this looks for a word given a form of the word which may be inflected (eg, "universities") or a spelling variant (eg, "color" for "colour"). Acronyms are not considered to be variants (eg, "UK" and "United Kingdom" are regarded as different words).
Note: variant lookup is not guaranteed, this is a feature which hopefully will develop over time - variant and {@link simplenlg.framework.LexicalCategory}; for example "universities" and
Noun
For each type of lookup, there are three methods
-
getWords
: get all matching {@link simplenlg.framework.WordElement} in the Lexicon. For example,getWords("dog")
would return a List
of two WordElement
, one for the noun "dog" and one for the verb "dog". If there are no matching entries in the lexicon, this method returns an empty collection -
getWord
: get a single matching {@link simplenlg.framework.WordElement} in the Lexicon. For example,getWord("dog")
would a for either the noun "dog" or the verb "dog" (unpredictable). If there are no matching entries in the lexicon, this method will create a default WordElement
based on the information specified. -
hasWord
: returns true
if the Lexicon contains at least one matching WordElement
@author Albert Gatt (simplenlg v3 lexicon)
@author Ehud Reiter (simplenlg v4 lexicon)