net.javlov.Option
An {@code Option} represents a temporally extended action. It is possiblyhierarchical, i.e. it can consist of other options. Primitive actions are represented by the {@code Action} interface, which extendsthis interface. Therefore, {@code Action}s are just a special case of {@code Option}s: namely, one-step options. See for further details on the ideas and algorithms behind options: Sutton, Precup, and Singh (1999). Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence 112, pp 181--211. Since options, like agents, have a policy to pick actions, and can learn in order to improve that policy, options can be considered as special cases of agents, and are implemented as such in Javlov. This design gives rise to a natural decomposition of higher-level agents into lower-level agents, and also allows for lower-level agents/options to execute in a different environment from the higher-level agent(s), possibly with a different state representation and reward function. In Javlov, the only actions that can be executed on {@Environment}s are primitive actions. At each timestep, the primitive action selected by the {@code Option}, or one of its lower-level {@code Option}s, can be obtained through the {@link #doStep(State,double)} (when continuing the {@code Option}, i.e., when the option was also executing in the previous timestep) or {@link #firstStep(State)} (when starting to execute the option)methods of the {@code Agent} interface.
@author Matthijs Snel