This code is part of the Stanford NLP Toolkit.
An implementation of L-BFGS for Quasi Newton unconstrained minimization.
The general outline of the algorithm is taken from:
Numerical Optimization (second edition) 2006 Jorge Nocedal and Stephen J. Wright
A variety of different options are available.
LINESEARCHES
BACKTRACKING: This routine simply starts with a guess for step size of 1. If the step size doesn't supply a sufficient decrease in the function value the step is updated through step = 0.1*step. This method is certainly simpler, but doesn't allow for an increase in step size, and isn't well suited for Quasi Newton methods.
MINPACK: This routine is based off of the implementation used in MINPACK. This routine finds a point satisfying the Wolfe conditions, which state that a point must have a sufficiently smaller function value, and a gradient of smaller magnitude. This provides enough to prove theoretically quadratic convergence. In order to find such a point the linesearch first finds an interval which must contain a satisfying point, and then progressively reduces that interval all using cubic or quadratic interpolation.
SCALING: L-BFGS allows the initial guess at the hessian to be updated at each step. Standard BFGS does this by approximating the hessian as a scaled identity matrix. To use this method set the scaleOpt to SCALAR. A better way of approximate the hessian is by using a scaling diagonal matrix. The diagonal can then be updated as more information comes in. This method can be used by setting scaleOpt to DIAGONAL.
CONVERGENCE: Previously convergence was gauged by looking at the average decrease per step dividing that by the current value and terminating when that value because smaller than TOL. This method fails when the function value approaches zero, so two other convergence criteria are used. The first stores the initial gradient norm |g0|, then terminates when the new gradient norm, |g| is sufficiently smaller: i.e., |g| < eps*|g0| the second checks if |g| < eps*max( 1 , |x| ) which is essentially checking to see if the gradient is numerically zero.
Each of these convergence criteria can be turned on or off by setting the flags:
private boolean useAveImprovement = true; private boolean useRelativeNorm = true; private boolean useNumericalZero = true;
To use the QNMinimizer first construct it using
QNMinimizer qn = new QNMinimizer(mem, true)
mem - the number of previous estimate vector pairs to store, generally 15 is plenty. true - this tells the QN to use the MINPACK linesearch with DIAGONAL scaling. false would lead to the use of the criteria used in the old QNMinimizer class.
Then call:
qn.minimize(dfunction,convergenceTolerance,initialGuess,maxFunctionEvaluations);
@author akleeman