- Introduction
- Mutivariable Search Methods Overview
- Unconstrained Multivariable Search Methods
- Constrained Multivariable Search Methods
- Successive Linear Programming
- Successive Quadratic Programming
- Generalized Reduced Gradient Method
- Penalty, Barrier and Augmented Lagrangian Functions
- Other Multivariable Constrained Search Methods
- Comparison of Constrained Multivariable Search Methods
- Stochastic Approximation Procedures
- Closure
- FORTRAN Program for BFGS Search of an Unconstrained Function
- References
- Problems
In this section on unconstrained multivariable search methods, several of the most effective and widely used methods are described. First, the quasi-Newton methods are given which have proved to be the most effective and more elaborate of the procedures. Then conjugate gradient and conjugate direction methods are illustrated with two examples. Finally, the popular function comparison procedure, pattern search, is presented, and assessments of these methods are presented as related to problems with constraints. Before
discussing the specifics of the methods, it is necessary to describe the
desirable features of an algorithm. As mentioned previously, the algorithm
should generate a sequence of values of
The proof of this theorem is by contradiction. As will
be seen, the method of steepest ascent (gradient search) is not an effective
method even though it will converge to an optimum eventually. The algorithm
tends to zigzag, and the rate of convergence is significantly slower than
other algorithms. Consequently, the rate, or order, of convergence of
an algorithm is another important theoretical property. The rate of convergence
of a sequence Another
criterion often used to compare algorithms is their ability to locate
the optimum of quadratic functions. This is called There are several caveats about relying on theoretical results in judging algorithms. One is that the existence of convergence and rate of convergence results for any algorithm does not a guarantee good performance in practice, according to Fletcher(4). One reason is that these theoretical results do not account for computer round-off error, which may be crucial. Both numerical experimentation with a variety of test functions and convergence and rate of convergence proofs are required to give a reliable indication of good performance. Also, as discussed by Gill et al.(6) conditions for achieving the theoretical rate of convergence may be rare, because an infinite sequence does not exist on a computer. Moreover, the absence of a theorem on the rate of convergence of an algorithm may be as much a measure of the difficulty of the proof as the inadequacy of the algorithm, according to Gill et al.(6).
These methods begin the search along a gradient line and use gradient information to build a quadratic fit to the economic model (profit function). Consequently, to understand these methods it is helpful to discuss the gradient search algorithm and Newton's method as background for the extension to the quasi-Newton algorithms. All of the algorithms involve a line search given by the following equation.
For gradient search Gradient search,
or the method of steepest ascent, was presented in Chapter 2 as an example
of the application of the method of Lagrange Multipliers. However, let
us consider briefly another approach to obtain this result, which should
give added insight to the method. First, the profit function is expanded
around point In matrix notation, the above equation has the following form.
Then to
maximize y(
The magnitude of the gradient of
y(
where a is the proportionality constant and is also the parameter of the gradient line. Therefore, the gradient line, equation (6-6), can be written as:
The plus sign in the equation indicates the direction of steepest ascent, and using a negative sign in the equation would give the direction of steepest descent. However, these directions are actually steep ascent (descent) rather than steepest ascent (descent). Only if the optimization problem is scaled such that a unit change in each of the independent variables produces the same change in the profit function will the gradient move in the direction of steepest ascent. The procedures for scaling has been described in detail by Wilde(10) and Wilde and Beightler(12), and scaling is a problem encountered with all search methods. The following short example illustrates the gradient method for a simple function with ellipsoidal contours. The zig-zag behavior is observed as the algorithm moves from the starting point at (2,-2,1) to the minimum at (0,0,0) of a function which is the sum of squares.
Search for the minimum of the following
function using gradient search starting at point
The gradient line, equation (6-7),
for point
and the three components of this equation are: Evaluating the partial derivatives gives: The gradient line is:
Converting y(x
Computing point
Continuing, the partial derivatives
are evaluated at The gradient line at
The value of a
which miminizes y(a)
along the gradient line from
A stopping criterion - having the
independent variables be less than or equal to 1x10 Notice that the value of the parameter of the gradient line is always negative. This indicates the algorithm is moving in the direction of steepest descent. As above results show, gradient search tends to take a zig-zag path to the minimum of the function. This is typical of the performance of this algorithm. In the
development of Newton's method, the Taylor series expansion of y( A more convenient way to write this equation is:
where The algorithm
is developed by locating the stationary point of equation (6-8) or (6-9)
by setting the first partial derivatives with respect to x which, when written in terms of the Hessian matrix, is:
Then solving for
Comparing this equation to equation
(6-2), it is seen that a
=1 and
Search for the minimum of the function
from example 6-1 using Newton's method starting at point
From the previous example the gradient is:
The Hessian matrix formed from
the second partial derivatives evaluated at The algorithm is given by equation (6-12), and for this example is: The minimum of the quadratic function is located with one application of the algorithm. In Newton's
method, if
The proof of the theorem has Newton's method has the property of quadratic termination, as demonstrated by the example above. It arrives at the optimum of a quadratic function in a finite number of steps, one. However, for nonlinear functions generally, Newton's method moves methodically toward the optimum, but the computational effort required to compute the inverse of the Hessian matrix at each iteration usually is excessive compared to other methods. Consequently, it is considered to be an inefficient middle game procedure for most problems. To overcome
these difficulties, quasi-Newton methods were developed which use the
algorithm given by equation (6-2). They begin with a search along the
gradient line, and only gradient measurements are required for Davidon developed the concept in 1959, and Fletcher and Powell, in 1963, extended the methodology. As discussed by Fletcher(4), there have been a number of other contributors to this area, also. The DFP (Davidon, Fletcher, Powell) algorithm has become the best known of the quasi-Newton (variable metric or large-step gradient) algorithms. Some of its properties are superlinear rate of convergence on general functions, and quadratic termination using exact line searches on quadratic functions(4). A number
of variations of the functional form of the matrix The DFP
algorithm has the following form of equation (6-2) for minimizing the
function y(
where a
The matrices The algorithm
begins with a search along the gradient line from the starting point
where The matrices
The sum
of the The development
of the algorithm and the proofs for the rate of convergence and quadratic
termination are given by Fletcher(4). Also, the procedure is applicable
to and effective on nonlinear functions. According to Fletcher(4), for
general functions it preserves positive definite The following example illustrates the use of the DFP algorithm for a quadratic function with three independent variables. Consequently, the optimum is reached with three applications of the algorithm.
Determine the minimum of the following
function using the DFP algorithm starting at
Performing the appropriate partial
differentiation, the gradient vector Ñy( Using equation (6-17) to start the algorithm gives: The optimal value of a
The value of
The algorithm continues using equations (6-13) and (6-14) for k=1.
or where
and A The optimal value of
a
The value of
The computation of
and where
Setting dy(a In the preceding example exact line searches were used to have the DFP algorithm proceed to the optimum. However, in optimization problems encountered in industrial practice, exact line searches are not possible, and numerical single-variable search methods must be used, ones such as golden section search or the quadratic method. However, the previously mentioned BFGS method will converge to the optimum of a convex function even when inexact line searches are used. Also, this global convergence property has not been demonstrated for other algorithms like the DFP algorithm, according to Fletcher(4). Consequently, this may be part of the reason that the BFGS algorithm has demonstrated generally more satisfactory performance than other methods in numerical experiments, even though it is a more elaborate formula. The BFGS matrix up-date formula comparable to equations (6-14), (6-15), and (6-16), as given by Fletcher(4) is:
where
d This equation
is used in place of equation (6-14) in the algorithm given by equation
(6-13). The procedure is the same in that a search along the gradient
line from starting point
Determine the minimum of the following
function using the BFGS algorithm starting at
The first application of the algorithm
is the same as Example 6-3, which is a search along the gradient line
through
The algorithm continues using equations (6-13) and (6-20) for k=1.
where
The optimal value of
a
The value for
The computation of
or where
The optimal value of a A program for the BFGS method is given in Table 6-4 at the end of this chapter. It employs the Fibonacci search program described in Chapter 5 for the line searches. This method and the program are applicable to functions that are not quadratic, also. However, the property of quadratic termination to the optimum in a predetermined number of steps is applicable to quadratic functions only; and a stopping criterion has to be specified for general nonlinear functions. In this program the function to be minimized and the stopping criterion, EPS, are to be supplied by the user; and the program terminates when the magnitude of successive values of the profit function are less than the value of the stopping criterion. The solution to the problem of Example 6-4 is given to illustrate the use of the program.
The distinguishing feature of these methods is that they have the quadratic termination property. The conjugate direction methods do not require derivative measurements, and the conjugate gradient methods only require gradient measurements. These procedures have been effective on a number of optimization problems, and they have been summarized by Fletcher(4) and others(6, 7, 8, 9, 15). The conjugate gradient and direction algorithms can locate the optimum of a quadratic function by searching only once along conjugate directions if exact line searches are used (quadratic termination), and all methods rely on the theorem given below. They differ in the way the conjugate directions are generated, and the objective has been to develop efficient methods for general functions(4). Two methods which have been consistently better performers than the others will be described, Powell's method for conjugate directions and gradient partan for conjugate gradients. The idea
for these methods is based on the fact that the optimum of a function
that is separable can be found by optimizing separately each component.
A quadratic function can be converted into a separable function, a sum
of perfect squares(15), using a linear transformation; and the optimum
can be found by a single variable search on each of the A quadratic function to be optimized can have the following form.
Then using of the properties of
a quadratic function, e.g.,
Then, using this property, sets of conjugate search directions can be constructed which minimize the quadratic function, equation (6-21), as illustrated by Himmelblau(8). The theorem on which these methods rely, as given by Fletcher(4), is:
The proof uses the stationary point necessary conditions, equation (6-22) and the fact that mutually conjugate vectors are linearly independent (4, 9, 26, 57). However, the proof does not give insight into the means of constructing conjugate directions(4). The notion
of conjugate directions is a generalization of orthogonal directions where
Searching along conjugate directions can be represented by the following equation. where a Then to locate the optimum, The two methods most frequently
associated with conjugate direction are illustrated in
Figure 6-1. These are Powell's method (57) and steep ascent partan(12).
In Powell's method, the conjugate directions are the orthogonal coordinate
axes initially, and in steep ascent partan the conjugate directions are
the gradient lines. Also, both procedures employ an acceleration step.
In the following paragraphs these two methods are discussed in more detail
for In Powell's algorithm(9) the procedure
begins at a starting point The basic procedure for an iteration,
as given by Powell(57), is as follows for a function of
For a quadratic function the method
will arrive at the minimum on completing Step 3. For a general function
Steps 1-3 are repeated until a stopping criterion is satisfied. Step 0
is required to start the method by having
Determine the minimum of the following
function using Powell's method starting at point
As shown in Figure
6-2, the procedure begins at point Step 0.
n = 2
Using an exact line search,
a Step 1.
Using an exact line search, a
Using an exact line search, a Step 2. Step 3. Choose a
Using an exact line search, a
If the function in the above example had not been quadratic, the procedure
would have continued using Powell(57) has pointed out that this procedure required modification if the acceleration directions become close to being linearly dependent. He reported that this possibility has been found to be serious if the function depended on more than five variables. Powell developed a test that determined if the new conjugate direction was to replace one of the existing directions or if the iterative cycle, steps 1-3, was to be repeated with the existing set of linearly independent directions. If the reader plans to use this procedure Powell's paper(57) should be examined for the details of this test which was said to be essential to minimize a function of twenty independent variables. This method has been called one of the more efficient and reliable of the direct search methods(15). The reason is its relative simplicity and quadratic termination property. The method, sectioning, which does not employ the acceleration step but just searches the coordinate axes one at a time, is not as effective and can be confounded by resolution ridges. The conjugate
gradient method, gradient partan, has proved to be as effective as Powell's
method. It is an extension of gradient search and has the ability to locate
the optimum of a function with ellipsoidal contours (quadratic termination)
in a finite number of steps. The term For two variables the procedure employs two gradient searches followed by an acceleration step, as shown in Figure 6-1, for a function with elliptical contours. The acceleration line passes through the optimum. The equations for the gradient and acceleration lines for this method are:
For more than two variables the following diagram shows the sequence of gradient searches and acceleration steps required for a function with ellipsoidal contours. _____________________________________________________________________________ Gradient Partan Algorithm for a Function with Ellipsoidal Contours Number of Independent Variables __________________________________________________
_____________________________________________________________________________ To have the recursion relation
shown above, it is necessary to omit a point numbered
Determine the minimum of the following
function using gradient partan starting at the point
Beginning with a gradient search
from point
Performing an exact line search
along the gradient from
Setting dy/da = 0 to locate the minimum of y along the gradient line gives: Solving for the optimum value of
gives a
Performing an exact line search along the gradient gives:
Setting dy/da
= 0 and solving gives a
Performing a search along the acceleration line gives:
Setting dy/da
= 0 and solving gives a The procedure
is continued with a gradient search from The parameter of the gradient line is negative showing that the procedure is moving in the direction of steep descent. The parameter of the acceleration line is greater than one showing the new point lies beyond the last point. This procedure has been used succesfully on numerous problems. However, it has been referred to as a "rich man's optimizer" by Wilde(10). The method tends to oscillate on problems with sharp curving ridges, and numerical computation of the gradient requires more computer time and storage than some other methods. The two equations used, the gradient and acceleration lines, are simple and easy to program; and the method will find better values in each step toward the optimum. For those interested in a detailed discussion of conjugate gradient and direction methods, the books by Fletcher(4), Gill, et al.(6), Avriel(9), Himmelblau(8), Reklaitis et al.(15) and Wilde and Beightler(12) are recommended. Now, we will examine another class of methods that rely on logical algorithms to move rapidly from the starting point to one near an optimum.
These procedures use algorithms
based on logical concepts to find a sequence of improved values of the
economic model leading to an optimum. They begin with local exploration,
and then attempt to accelerate in the direction of success. Then if a
failure occurs in that direction, the method repeats local exploration
to find another direction of improved values of the economic model. If
this fails, the algorithm's logic may then try other strategies including
a quadratic fit of the economic model (end game) to look for better values.
Typically, these procedures do not require derivative measurements, and
the algorithm compares the computed values of the economic model. Thus,
they are sometimes called Two of
the better known methods are pattern search(12) and the polytope or simplicial
method(6). Both have been used successfully on a number of problems. Pattern
search is probably the more widely used of the two procedures, and it
will be discussed in more detail. The polytope method performs local explorations
at the vertices of an The logical algorithm of pattern search is illustrated in Figure 6-3, and it begins with short excursions from the starting point to establish a pattern of improved values of the economic model. Based on these function comparisons, it accelerates in the direction established from the local explorations. If successful, the acceleration is continued. Then when a failure is encountered, i.e., a value of the economic model is less than the previous one, the pattern is said to be destroyed; and local explorations are performed to establish a new pattern of improved values of the economic model. Again, acceleration is performed in the new direction until a failure is encountered. The procedure continues in this fashion until an apparent optimum is reached. Then the step size of the local exploration is reduced, attempting to find another direction of improvement in the economic model. If this is successful, the procedure continues until another optimum is found. Reducing the step size is repeated; and if this is unsuccessful in finding a new direction, the current point is declared a local optimum. However, a quadratic fit at the point is needed to confirm that it is an optimum rather than a saddle point. The algorithm
has two parts. One is the local exploration procedure, and the other is
the acceleration step. The local explorations are performed about a base
point by perturbing one variable at a time. Each time a variable is perturbed
and a better value of the economic model is found, this point is used
when the next variable is changed rather than returning to the original
point. These are called where When these perturbations and evaluations
are performed for each of the coordinate axes, a final point
Now, point
The search grows with repeated success. At this point the two parts of the algorithm have been described in a general form. The local exploration step and the acceleration step can be readily implemented in a computer program, and one is given by Kuester and Mize(16). In addition, the following example illustrates the method on the contour diagram of a function of two independent variables shown in Figure 6-3. It shows the local exploration, acceleration, pattern destroyed and reestablished, and location of the optimum.
Locate the maximum of the function
shown in Figure 6-3 using pattern
search starting at the points indicated as To begin,
local explorations are performed by moving in the positive coordinate
axis direction first (open circles indicate failures; and solid circle
indicate successes). On the x Pattern search has been used successfully on a number of types of problems, and it has been found to be most effective on problems with a relatively small number of independent variables, e.g., 10 or fewer. It has the advantage of adjusting to the terrain of a function and will follow a curving ridge. However, it can be confounded by resolution ridges (12), and a quadratic fit is appropriate to avoid this weakness. There are a number of other methods based on logical algorithms. These are discussed in some detail in the texts by Himmelblau(8), Gill et al.(6), and Reklaitis et al.(15). However, none of these are superior to the ones discussed here. Now, we will turn our attention to methods used for constrained multivariable search problems and see that the DFP and BFGS procedures are an integral part of some of these methods. |