The most relevant description of this algorithm can be found in the paper “A subspace, interior and conjugate gradient method for large-scale bound-constrained minimization problems” by Coleman and Li, some insights on its implementation can be found in MATLAB documentation here and here. The difficulty was that the algorithm incorporates several ideas, but it was not very clear how to combine them all together in the actual code. I will describe each idea separately and then outline the algorithm in general. I will consider the algorithm applied to a problem with known Hessian, in least squares we replace it by . I won’t give any explanation or motivation for some things, if you are really interesting try digging into the original papers.
Interior Trust-Region Approach and Scaling Matrix
The minimization problem is stated as follows:
Some of the components of and can be infinite meaning no bound in this direction. Let’s use the notation and . The first order necessary conditions for to be a local minimum:
Define a vector with the following components:
Its components are distances to the bounds at which anti-gradient points (if this distance is finite). Define a matrix , the first order optimality can be stated as . Now we can think of our optimization problem as the diagonal system of nonlinear equations (I would say it is the main idea of this part):
The Jacobian of the left hand side exist whenever for all , which is true when (not on the bound). Assume that this holds, then Newton step for this system satisfies:
Here is diagonal Jacobian matrix of , its elements take values or , note that all elements of the matrix are non-negative. Now introduce the change of variables . In the new variables we have Newton step satisfying: where , (note that is a proper gradient of with respect to “hat” variables). Looking at this Newton step we formulate corresponding trust-region problem:
In the original space we have:
and the equivalent trust-region problem
From my experience the better approach is to solve the trust-region problem in “hat” space, so we don’t need to compute which can become arbitrary large when the optimum is on the boundary and the algorithm approaches it.
A modified improvement ratio of out trust-region solution is computed as follows:
Based on we adjust a radius of trust region using some reasonable strategy.
Now summary and conclusion for this section. Motivated by the first-order optimality condition we introduced a matrix and reformulated our problem as the system of nonlinear equations. Then motivated by the Newton process for this system we formulated the corresponding trust-region problem. The purpose of the matrix is to prevent steps directly into bounds, so that other variables can also be explored during the step. It absolutely doesn’t mean that after introducing such matrix we can ignore the bounds, specifically our estimates must remain strictly feasible. The full algorithm will be described below.
This idea comes from another paper “On the convergence of reflective Newton methods for large-scale nonlinear minimization subject to bounds” by the same authors. Conceptually we apply a special transformation , such that is unbounded variable and try to solve unconstrained problem . The authors suggest a reflective transformation: a piecewise linear function, equal to identity when satisfies the initial bound constraints, otherwise reflected from the bounds as a beam of light (I hope you got the idea). I implemented it as follows (although don’t use this code anywhere):
import numpy as np def reflective_transformation(y, l, u): if l is None: l = np.full_like(y, -np.inf) if u is None: u = np.full_like(y, np.inf) l_fin = np.isfinite(l) u_fin = np.isfinite(u) x = y.copy() m = l_fin & ~u_fin x[m] = np.maximum(y[m], 2 * l[m] - y[m]) m = ~l_fin & u_fin x[m] = np.minimum(y[m], 2 * u[m] - y[m]) m = l_fin & u_fin d = u - l t = np.remainder(y[m] - l[m], 2 * d[m]) x[m] = l[m] + np.minimum(t, 2 * d[m] - t) return x
This transformation is simple and doesn’t significantly increase the complexity of the function to minimize. But it is not differentiable when is on the bounds, thus we again use strictly feasible iterates. The general idea of the reflective Newton method is to do line search along the reflective path (or a traditional straight line in space). According to the authors this method has cool properties, but it is used very modestly in the final large-scale Trust Region Reflective.
Large Scale Trust-Region Problem
In the previous post I conceptually described how to accurately solve trust-region subproblems arising in least-squares minimization. Here I again focus on least-squares setting and briefly describe how it can be solved approximately in large-scale.
- Steihaug Conjugate Gradient. Apply conjugate gradient method to the normal equation until the current approximate solution falls outside the trust region (or indefinite direction is found if is rank deficient). This actually might be just the best approach for least squares as we don’t have negative curvature directions in , and the only criticism of Steihaug-CG I read is that it can terminate before finding the negative curvature direction. I would assume that it is not very important for positive semidefinite case.
- Two-dimensional subspace minimization. We form a basis consisting of two “good” vectors, then solve two-dimensional trust region problem with the exact method. The first vector is a gradient, the second is an approximate solution of linear least squares with the current (computed by LSQR or LSMR). When Jacobian is rank deficient the situation is somewhat problematic, as I noticed in this case a least-norm solution is useless for approximating a trust-region solution. In this case we need to add (not too big) regularization diagonal term to . A recipe for this situation is given in “Approximate solution of the trust region problem by minimization over two-dimensional subspaces”.
Outline of Trust Region Reflective
Here is the high level description.
- Consider the trust-region problem in “hat” space as described in the first section.
- Find its solution by whatever method is appropriate (exact for small problems, approximate for large scale). Compute the corresponding solution in the original space .
- Restrict this trust-region step to lie within bounds if necessary. Step back from the bounds by times the step length. Do it for all type of steps below.
- Consider a single reflection of the trust-region step if bound was encountered in 3. Use 1-d minimization of the quadratic model to find the minimum along the reflected direction (this is trivial).
- Find the minimum of the quadratic model along the . (Rarely it can be better than the trust-region step because of the bounds.)
- Choose the best step among 3, 4, 5. Compute the corresponding step in the original space as in 2, update .
- Update the trust region radius by computing as described in the first section.
- Check for convergence and go to 1 if the algorithm has not converged.
In the next two posts I will describe another type of algorithm which we call “dogbox” and provide comparison benchmark results.