Chapter 6

MULTI VARIABLE OPTIMIZATION PROCEDURES

Unconstrained Multivariable Search Methods top

In this section on unconstrained multivariable search methods, several of the most effective and widely used methods are described. First, the quasi-Newton methods are given which have proved to be the most effective and more elaborate of the procedures. Then conjugate gradient and conjugate direction methods are illustrated with two examples. Finally, the popular function comparison procedure, pattern search, is presented, and assessments of these methods are presented as related to problems with constraints.

     Before discussing the specifics of the methods, it is necessary to describe the desirable features of an algorithm. As mentioned previously, the algorithm should generate a sequence of values of xk that move rapidly from the starting point xo to the neighborhood of the optimum x*. Then the iterates xk should converge to x* and terminate when a convergence test is satisfied. Therefore, an important theoretical result for an algorithm would be a theorem that proves the sequence of values xk generated by the algorithm converge to a local optimum. For example, the following theorem from Walsh(26) provides sufficient conditions for convergence of the method of steepest ascent (gradient search).

If the limit of the sequence {xk} of x k+1 = x k + ay(xk) is x* for all x in a suitable neighborhood of x*, then y(x) has a local minimum at x = x*.

The proof of this theorem is by contradiction.

     As will be seen, the method of steepest ascent (gradient search) is not an effective method even though it will converge to an optimum eventually. The algorithm tends to zigzag, and the rate of convergence is significantly slower than other algorithms. Consequently, the rate, or order, of convergence of an algorithm is another important theoretical property. The rate of convergence of a sequence xk for an algorithm as described by Fletcher(4) is in terms of the norm of the difference of a point in the sequence xk and the optimum x*, i.e., ||xk - x*||. If ||xk+1 - x*|| / ||xk - x*|| a, then the rate of convergence is said to be linear or first order if a > 0. It is said to be superlinear if a = 0. For an algorithm, it is desirable to have the value of a as small as possible. For some algorithms it is possible to show that ||xk+1 - x*|| / ||xk - x*||2 a, and for this case the rate of convergence is said to be quadratic or second-order. For the method of steepest ascent, Fletcher(4) states that the rate of convergence is a slow rate of linear convergence that depends on the largest and smallest eigenvalues of the Hessian matrix.

     Another criterion often used to compare algorithms is their ability to locate the optimum of quadratic functions. This is called quadratic termination. The justification for using this criterion for comparison is that near an optimum the function can be "adequately approximated by a quadratic form," according to Bazaraa and Shetty(56). They claim that an algorithm that does not perform well in minimizing a quadratic function probably will not do well for a general nonlinear function, especially near the optimum.

     There are several caveats about relying on theoretical results in judging algorithms. One is that the existence of convergence and rate of convergence results for any algorithm does not a guarantee good performance in practice, according to Fletcher(4). One reason is that these theoretical results do not account for computer round-off error, which may be crucial. Both numerical experimentation with a variety of test functions and convergence and rate of convergence proofs are required to give a reliable indication of good performance. Also, as discussed by Gill et al.(6) conditions for achieving the theoretical rate of convergence may be rare, because an infinite sequence does not exist on a computer. Moreover, the absence of a theorem on the rate of convergence of an algorithm may be as much a measure of the difficulty of the proof as the inadequacy of the algorithm, according to Gill et al.(6).

Quasi-Newton Methods top

These methods begin the search along a gradient line and use gradient information to build a quadratic fit to the economic model (profit function). Consequently, to understand these methods it is helpful to discuss the gradient search algorithm and Newton's method as background for the extension to the quasi-Newton algorithms. All of the algorithms involve a line search given by the following equation.

xk+1 = xk - aHk y(xk)                               (6-2)

For gradient search Hk is I, the unit matrix, and is the parameter of the gradient line. For Newton's method Hk is the inverse of the Hessian matrix, H-1; and a is one. For quasi-Newton methods Hk is a series of matrices beginning with the unit matrix, I, and ending with the inverse of the Hessian matrix, H-1. The quasi-Newton algorithm that employs the BFGS(Broyden, Fletcher, Golfarb, Shanno) formula for updating the Hessian matrix is considered to be the most effective of the unconstrained multivariable search techniques, according to Fletcher(5). This formula is an extension of the DFP (Davidon, Fletcher, Powell) formula.

    Gradient search, or the method of steepest ascent, was presented in Chapter 2 as an example of the application of the method of Lagrange Multipliers. However, let us consider briefly another approach to obtain this result, which should give added insight to the method. First, the profit function is expanded around point xk in a Taylor series with only first-order terms as:

Equ6-3.jpg (6518 bytes)

In matrix notation, the above equation has the following form.

Maximize: y(x) = y(xk) + Ty(xk)(x - xk)                             (6-4)

     Then to maximize y(x), the largest value of y(xk)T(x - xk) is to be used. When the largest value of y(xk)T(x - xk) is determined, it has to be in the form of an equation that gives the way to change the individual xj's to move in the direction of steepest ascent. This term can be written in vector notation as the dot product of two vectors.

y(xk)(x - xk) = |y(xk)| | x - xk| cos q                              (6-5)

The magnitude of the gradient of y(xk) at point xk,|y(xk)|, is known or can be measured at xk, and the magnitude of the vector (x - xk) is to be determined to maximize the dot product of the two vectors. In examining equation (6-5), the largest value of the dot product is with the value of q = 0 where cos (0) = 1. Consequently, the two vectors y(xk) and (x - xk) are colinear and are proportional. This is given by the following equation:

(x - xk) = ay(xk)                                                             (6-6)

where a is the proportionality constant and is also the parameter of the gradient line. Therefore, the gradient line, equation (6-6), can be written as:

x = xk + ay(xk)                                                              (6-7)

The plus sign in the equation indicates the direction of steepest ascent, and using a negative sign in the equation would give the direction of steepest descent. However, these directions are actually steep ascent (descent) rather than steepest ascent (descent). Only if the optimization problem is scaled such that a unit change in each of the independent variables produces the same change in the profit function will the gradient move in the direction of steepest ascent. The procedures for scaling has been described in detail by Wilde(10) and Wilde and Beightler(12), and scaling is a problem encountered with all search methods.

     The following short example illustrates the gradient method for a simple function with ellipsoidal contours. The zig-zag behavior is observed as the algorithm moves from the starting point at (2,-2,1) to the minimum at (0,0,0) of a function which is the sum of squares.

Example 6-1

Search for the minimum of the following function using gradient search starting at point x0 = (2,-2,1).

y = 2x12 + x22 + 3x32

The gradient line, equation (6-7), for point x0 is:

x = x0 + ay(x0)

and the three components of this equation are:

Ex6-1a.jpg (8292 bytes)

Evaluating the partial derivatives gives:

Ex6-1b.jpg (9151 bytes)

The gradient line is:

x1 = 2 + 8a

x2 = -2 - 4a

x3 = 1 + 6a

Converting y(x1,x2,x3) into y(a) for an exact line search gives:

y = 2(2 + 8a)2 + (-2 - 4a)2 + 3(1 + 6a)2

Ex6-1c.jpg (6036 bytes)

Computing point x1 using a* = -0.23016 gives:

x1 = 2 + 8(-0.23016) = 0.15872

x2 = -2 - 4(-0.23016) = 1.0794

x3 = 1 + 6(-0.23016) = -0.38096

Continuing, the partial derivatives are evaluated at x1 to give:

Ex6-1d.jpg (10761 bytes)

The gradient line at x1 is:

x1 = 0.15872 + 0.63688a

x2 = 1.0794 + 2.1588a

x3 = -0.38096 - 2.2858a

The value of a which miminizes y(a) along the gradient line from x1 is computed as was done previously, and the result is a* = -0.2433. Using this value of a, the point x2 is computed as (0.004524, 0.5542, 0.1752). Then, the search is continued along the gradient line from x2 to x3. These results and those from subsequent application of the algorithm are tabulated below, along with the previous results.

 

Pg212.jpg (32999 bytes)

A stopping criterion - having the independent variables be less than or equal to 1x10-3 -was used. Also, a criterion on the value of y(x) could have been used.

     Notice that the value of the parameter of the gradient line is always negative. This indicates the algorithm is moving in the direction of steepest descent. As above results show, gradient search tends to take a zig-zag path to the minimum of the function. This is typical of the performance of this algorithm.

     In the development of Newton's method, the Taylor series expansion of y(x) about xk includes the second order terms as shown below.

 Equ6-8.jpg (10279 bytes)

A more convenient way to write this equation is:

y(x) = y(xk) + y(xk)T(x - xk) + (x - xk)T H(x - xk)                            (6-9)

where H is the Hessian matrix, the matrix of second partial derivatives evaluated at the point xk, and (x - xk)T is the row vector, which is the transpose of the column vector of the difference between the vector of independent variables x and the point xk used for the Taylor Series expansion.

     The algorithm is developed by locating the stationary point of equation (6-8) or (6-9) by setting the first partial derivatives with respect to x1, x2, ..., xn equal to zero. For equation (6-8) the result is:

Equ6-10.jpg (13035 bytes)

which, when written in terms of the Hessian matrix, is:

y(xk) + H(x - xk) = 0                                  (6-11)

Then solving for x, the optimum of the quadratic approximation, the following equation is obtained which is the Newton's method algorithm.

x = xk - H-1 y(xk )                                      (6-12)

Comparing this equation to equation (6-2), it is seen that a =1 and Hk = H-1, the inverse of the Hessian matrix. Also, a line search is not required for this method, because a = 1. However, more computational effort is required for one iteration of this algorithm than for one iteration of gradient search, because the inverse of the Hessian matrix has to be evaluated in addition to the gradient vector. The same quadratic function of the gradient search algorithm example is used to illustrate Newton's method in the following example, and it shows the additional computations required.

Example 6-2

Search for the minimum of the function from example 6-1 using Newton's method starting at point x0 = (2,-2,1).

y = 2x12 + x22 + 3x32

From the previous example the gradient is:

y(x0)T = (8,-4,6)

The Hessian matrix formed from the second partial derivatives evaluated at x0 and its inverse are:

Ex6-2a.jpg (5166 bytes)       Ex6-2b.jpg (5770 bytes)

The algorithm is given by equation (6-12), and for this example is:

Ex6-2c.jpg (9622 bytes)

The minimum of the quadratic function is located with one application of the algorithm.

     In Newton's method, if xk is not close to x*, it may happen that H-1 is not positive definite, and then the method may fail to converge in this case (26). However, if the starting point xo is sufficiently close to a local optimum x*, the rate of convergence is second order as given by the following theorem from Fletcher(4).

If xk is sufficiently close to x* for some k, and if H* is positive definite, then Newton's method is well defined for all k and converges at a second order rate.

The proof of the theorem has xk in the neighborhood of x* and uses induction.

     Newton's method has the property of quadratic termination, as demonstrated by the example above. It arrives at the optimum of a quadratic function in a finite number of steps, one.

     However, for nonlinear functions generally, Newton's method moves methodically toward the optimum, but the computational effort required to compute the inverse of the Hessian matrix at each iteration usually is excessive compared to other methods. Consequently, it is considered to be an inefficient middle game procedure for most problems.

     To overcome these difficulties, quasi-Newton methods were developed which use the algorithm given by equation (6-2). They begin with a search along the gradient line, and only gradient measurements are required for Hk in subsequent applications of the algorithm given by equation (6-2). As the algorithm proceeds, a quadratic approximation to the profit function is developed only from the gradient measurements, and for a quadratic function of n independent variables, the optimum is reached after n applications of the algorithm.

     Davidon developed the concept in 1959, and Fletcher and Powell, in 1963, extended the methodology. As discussed by Fletcher(4), there have been a number of other contributors to this area, also. The DFP (Davidon, Fletcher, Powell) algorithm has become the best known of the quasi-Newton (variable metric or large-step gradient) algorithms. Some of its properties are superlinear rate of convergence on general functions, and quadratic termination using exact line searches on quadratic functions(4).

     A number of variations of the functional form of the matrix Hk of equation (6-2) with the properties described above have been developed, and some of these have been tabulated by Himmelblau(8). However, as previously stated, the BFGS algorithm which was developed in 1970 is preferable to the others, and this is currently well accepted, according to Fletcher(4). The following paragraphs will describe the DFP and BFGS algorithms and illustrate each with an example. Convergence proofs and related information are given by Fletcher(4) and others (6, 7, 8, 9).

     The DFP algorithm has the following form of equation (6-2) for minimizing the function y(x).

xk+1 = xk - k+1 Hk y(xk)                   (6-13)

where ak+1 is the parameter of the line through xk, and Hk is given by the following equation(12).

Hk = Hk-1 + Ak + Bk                           (6-14)

The matrices Ak and Bk are given by the following equations.

Equ6-15.jpg (17241 bytes)

     The algorithm begins with a search along the gradient line from the starting point x0 as given by the following equation obtained from equation (6-13), with k = 0.

x1 = x0 - a1H0 y(x0)                               (6-17)

where H0 = I is the unit matrix. This equation is the same as equation (6-7) for the gradient line. The algorithm continues using equation (6-13) until a stopping criterion is met. However, for a quadratic function with n independent variables the method converges to the optimum after n iterations (quadratic termination) if exact line searches are used.

     The matrices Ak and Bk have been constructed so their sums would have the specific properties shown below(12).

Equ6-18.jpg (6142 bytes)

     The sum of the n matrices Ak generate the inverse of the Hessian matrix H-1 to have equation (6-13) be the same as Newton's method, equation (6-12), at the end of n iterations. The sum of the matrices Bk generates the negative of the unit matrix I at the end of n iterations to cancel the first step of the algorithm when I was used for H0 in equation (6-17).

     The development of the algorithm and the proofs for the rate of convergence and quadratic termination are given by Fletcher(4). Also, the procedure is applicable to and effective on nonlinear functions. According to Fletcher(4), for general functions it preserves positive definite Hk matrices, and thus the descent property holds. Also, it has a superlinear rate of convergence, and it converges to the global minimum of strictly convex functions if exact line searches are used.

     The following example illustrates the use of the DFP algorithm for a quadratic function with three independent variables. Consequently, the optimum is reached with three applications of the algorithm.

Example 6-3(14)

Determine the minimum of the following function using the DFP algorithm starting at x0T = (0,0,0).

Minimize: 5x12 + 2x22 + 2x32 + 2x1x2 + 2x2x3 - 2x1x3 - 6x3

Performing the appropriate partial differentiation, the gradient vector y(x) and the Hessian matrix are:

 Ex6-3a.jpg (8156 bytes)         Ex6-3b.jpg (4918 bytes)

Using equation (6-17) to start the algorithm gives:

Ex6-3c.jpg (10342 bytes)

The optimal value of a1 is determined by an exact line search using x1 = 0, x2 = 0, x3 = 6a1 as follows:

y(a1) = 2(6a1)2 - 6(6a1) = 72a12 - 36a1

 Ex6-3d.jpg (3668 bytes)

The value of x1 is computed by substituting for 1 in the previous equation.

x1T = (0,0,3/2)         y(x1)T = (-3,3,0)          y(x0)T = (0,0,-6)

The algorithm continues using equations (6-13) and (6-14) for k=1.

x2 = x1 -a2 H1y(x1)

or

 Ex6-3e.jpg (14496 bytes)

where

H1 = H0 + A1 + B1

and A1 and B1 are given by equations (6-15) and (6-16).

Ex6-3f.jpg (9242 bytes)

Ex6-3g.jpg (17678 bytes)

Ex6-3h.jpg (16531 bytes)

The optimal value of a3 is determined by an exact line search as follows.

y(a3) = 12a22 - 12a2 + 9/2

 Ex6-3i.jpg (3574 bytes)

The value of x2 is computed by substituting for a2 in the previous equation.

x2T = (1, -1, 5/2)         y(x2)T = (3, 3, 0)

The computation of x3 uses equations (6-13) and (6-14) as follows:

x3 = x2 - 3 H2 y(x2)

and

Ex6-3j.jpg (8890 bytes)

where

Ex6-3k.jpg (11684 bytes)

 

Ex6-3l.jpg (23994 bytes)

 

Ex6-3m.jpg (16122 bytes)


The optimal value of
a3 is determined by an exact line search as follows:

y(a3) = 5 + 2(1 + 12a3/5)2 + 2(5/2 + 6a3/5)2 - 2(1 + 12a3/5)

      -2(1 + 12a3/5)(5/2 + 6a3/5) - 2(5/2 + 6a3/5) - 6(5/2 + 6a3/5)

Setting dy(a3)/d3 = 0 and solving for a3 gives a3 = 5/12 and x3T = (1, -2, 3) which is the value of the function at the minimum.

     In the preceding example exact line searches were used to have the DFP algorithm proceed to the optimum. However, in optimization problems encountered in industrial practice, exact line searches are not possible, and numerical single-variable search methods must be used, ones such as golden section search or the quadratic method. However, the previously mentioned BFGS method will converge to the optimum of a convex function even when inexact line searches are used. Also, this global convergence property has not been demonstrated for other algorithms like the DFP algorithm, according to Fletcher(4). Consequently, this may be part of the reason that the BFGS algorithm has demonstrated generally more satisfactory performance than other methods in numerical experiments, even though it is a more elaborate formula. The BFGS matrix up-date formula comparable to equations (6-14), (6-15), and (6-16), as given by Fletcher(4) is:

Equ6-20.jpg (11230 bytes)    (6-20)

where dk = xk+1 - xk and gk = y(xk+1) - y(xk).

     This equation is used in place of equation (6-14) in the algorithm given by equation (6-13). The procedure is the same in that a search along the gradient line from starting point x0 is conducted initially according to equation (6-17). Then the Hessian matrix is updated using equation (6-20), and for quadratic functions the method arrives at the minimum after n iterations. The following example illustrates the procedure for the BFGS algorithm using the function of Example 6-3.

Example 6-4(14)

Determine the minimum of the following function using the BFGS algorithm starting at x0 = (0,0,0).

Minimize: 5x12 + 2x22 + 2x32 + 2x1x2 + 2x2x3 - 2x1x3 - 6x3

The first application of the algorithm is the same as Example 6-3, which is a search along the gradient line through x0 = (0,0,0). These results were:

x1T = (0, 0, 3/2)             y(x1)T = (-3, 3, 0)

x0T = (0, 0, 0)                y(x0)T = (0, 0, -6)

The algorithm continues using equations (6-13) and (6-20) for k=1.

x2 = x1 - 2 H1y(x1)

or

Ex6-4a.jpg (11088 bytes)

where

Ex6-4b.jpg (10129 bytes)


Ex6-4c.jpg (10148 bytes)

 

d0Tg 0 = 9                g0T H0 g0 = 54

Ex6-4d.jpg (11883 bytes)

 

The optimal value of a2 is determined by an exact line search using x1 = 3a2, x2 = -3a2, x3 = 3/2 + 3a2 in the function being minimized to give:

y = 27a22 - 18a2 + 4

Ex6-4e.jpg (3808 bytes)

The value for x2 is computed by substuting for a2 in the previous equation.

x2T = (1, -1, 5/2)             y(x2)T = (3, 3, 0)

The computation of x3 repeats the application of the algorithm as follows:

x3 = x2 - a3 H2y(x2)

or

Ex6-4f.jpg (14404 bytes)

where

d1T = (1,-1,1)            g1T = (6,0,0)               d1Tg1 = 6                g1T H1g1 = 36

Ex6-4g.jpg (17096 bytes)

The optimal value of a3 is determined by an exact line search using x13 = 1, x23 = -1-6a3, x33 = 5/2 + 3a2 in the function being minimized to give y(a3). The value of a3 = 1/6 is determined as previously by setting dy(a3)/da3 = 0, and the optimal value of x3T = (1,-2,3) is computed, which is the value of the function at the minimum.

A program for the BFGS method is given in Table 6-4 at the end of this chapter. It employs the Fibonacci search program described in Chapter 5 for the line searches. This method and the program are applicable to functions that are not quadratic, also. However, the property of quadratic termination to the optimum in a predetermined number of steps is applicable to quadratic functions only; and a stopping criterion has to be specified for general nonlinear functions. In this program the function to be minimized and the stopping criterion, EPS, are to be supplied by the user; and the program terminates when the magnitude of successive values of the profit function are less than the value of the stopping criterion. The solution to the problem of Example 6-4 is given to illustrate the use of the program.

Conjugate Gradient and Direction Methods top

The distinguishing feature of these methods is that they have the quadratic termination property. The conjugate direction methods do not require derivative measurements, and the conjugate gradient methods only require gradient measurements. These procedures have been effective on a number of optimization problems, and they have been summarized by Fletcher(4) and others(6, 7, 8, 9, 15). The conjugate gradient and direction algorithms can locate the optimum of a quadratic function by searching only once along conjugate directions if exact line searches are used (quadratic termination), and all methods rely on the theorem given below. They differ in the way the conjugate directions are generated, and the objective has been to develop efficient methods for general functions(4). Two methods which have been consistently better performers than the others will be described, Powell's method for conjugate directions and gradient partan for conjugate gradients.

     The idea for these methods is based on the fact that the optimum of a function that is separable can be found by optimizing separately each component. A quadratic function can be converted into a separable function, a sum of perfect squares(15), using a linear transformation; and the optimum can be found by a single variable search on each of the n transformed independent variables. The directions from the transformations are called conjugate directions.

     A quadratic function to be optimized can have the following form.

y(x) = a + bTx + xTH x                 (6-21)

Then using of the properties of a quadratic function, e.g., H is a positive definite, symmetric matrix, it can be shown that a set of linearly independent vectors s1, s2, ..., sn; are mutually conjugate with respect to H if:

siT H sj = 0                                   (6-22)

Then, using this property, sets of conjugate search directions can be constructed which minimize the quadratic function, equation (6-21), as illustrated by Himmelblau(8). The theorem on which these methods rely, as given by Fletcher(4), is:

A conjugate direction method terminates for a quadratic function in at most n exact line searches, and each xi+1 is the minimizer in the subspace generated by xi and the directions s1, s2, ..., si (that is a set of points:

Pg223.jpg (3527 bytes)

The proof uses the stationary point necessary conditions, equation (6-22) and the fact that mutually conjugate vectors are linearly independent (4, 9, 26, 57). However, the proof does not give insight into the means of constructing conjugate directions(4).

     The notion of conjugate directions is a generalization of orthogonal directions where H = I, according to Avriel(9); and algorithms, such as Powell's method, initially search along orthogonal coordinate axes. Also, the DFP and the BFGS methods are conjugate direction methods when exact line searches are used(7).

     Searching along conjugate directions can be represented by the following equation.

Equ6-23.jpg (3959 bytes)

where ai is the parameter of the line in the conjugate directions (the orthogonal coordinate axes initially in Powell's method), and xi is the vector that gives the conjugate directions (a coordinate axis, e.g., xi = (0, ..., 0, xi, 0, ...0) in Powell's method). For a given direction of search, xi, the value of ai is located to give the optimum of y(xi) along the line of search. The function to be optimized can be written as:

Equ6-24.jpg (4731 bytes)

Then to locate the optimum, x* an exact line search is conducted on each of the ai's individually. The optimum of y(x) is then determined by exact line searches in each of the conjugate directions. Further details are given by Fletcher(4), Avriel(9), and Powell(57) about the theory for these methods.

The two methods most frequently associated with conjugate direction are illustrated in Figure 6-1. These are Powell's method (57) and steep ascent partan(12). In Powell's method, the conjugate directions are the orthogonal coordinate axes initially, and in steep ascent partan the conjugate directions are the gradient lines. Also, both procedures employ an acceleration step. In the following paragraphs these two methods are discussed in more detail for n independent variables and are illustrated with an example.

In Powell's algorithm(9) the procedure begins at a starting point x0, and each application of the algorithm consists of (n +2) successive exact line searches. The first (n +1) are along the n coordinate axes. The (n+2)nd line search goes from the point obtained from the first line search through the best point (obtained at the end of the (n +1) line searches). If the function is quadratic, this will locate the optimum. If it is not, then the search is continued with one of the first n directions replaced by the (n + 1)th direction, and the procedure is repeated until a stopping criterion is met.

The basic procedure for an iteration, as given by Powell(57), is as follows for a function of n independent variables starting at point xI with the conjugate direction s1, s2, ..., sn chosen as the coordinate axes.


Powell's Method for a General Function(57)

0. Calculate aI so that y(xI + aI sn) is a minimum, and define xo = xI +aI sn.

1. For j = 1, 2, ..., n:

    Calculate aj so that y(xj-1 + aj sj) is a minimum.

    Define xj = xj-1 + aj sj.

    Replace sj with sj+1.

2. Replace sn with xn - xo.

3. Choose a so that y[a(xn - xo)] is a minimum, and replace xo with xo + a(xn - xo).

4. Repeat steps 1-3 until a stopping criterion is met.

For a quadratic function the method will arrive at the minimum on completing Step 3. For a general function Steps 1-3 are repeated until a stopping criterion is satisfied. Step 0 is required to start the method by having xo, the point beginning the iteration steps 1-3, be a minimum point on the contour tangent line sn. The following example illustrates the above procedure for a quadratic function with two independent variables.

Example 6-5(8)

Determine the minimum of the following function using Powell's method starting at point xI = (2,2).

Minimize: y = 2x12 + x22 - x1x2

As shown in Figure 6-2, the procedure begins at point xI = (2,2), and step 0 locates the minimum on the contour tangent line sn, xo, by a single variable search along coordinate axis n (= 2) as follows:

Step 0.          n = 2       s1T = (1,0)        s2T = (0,1)          xIT = (2,2)

xo = xI + I s2   

or

Ex6-5a.jpg (4101 bytes)

y(aI) = 2(2)2 + (2 + aI)2 - (2)(2 + aI)

Using an exact line search, aI = -1 and xoT = (2,1).

Step 1.          s1T = (1,0)         s2T = (0,1)         xoT = (2,1)

j = 1 x1 = xo + a1 s1

or

Ex6-5b.jpg (3875 bytes)

y(a1) = 2(2 + a1)2 + (1)2 - (2 + a1)(1)

Using an exact line search, a1 = -7/4 and x1T = (,1). Replace s1 with s2

j = 2 x2 = x1 + 2 s2

or

Ex6-5c.jpg (4151 bytes)

y(a2) = 2()2 + (1 + a2)2 - ()(1 + a2)

Using an exact line search, a2 = - 7/8 and x2T = (1/4, 1/8)

Step 2. s2 is replaced with

Ex6-5d.jpg (4933 bytes)

Step 3. Choose a3 so that y[x2 + a3(x2 - xo)] is a minimum. Let:

Ex6-5e.jpg (5851 bytes)

y(a3) = 2(1/4 - 1 3/4a3)2 + (1/8 - 7/8a3)2 - (1/4 - 1 3/4a3)(1/8 - 7/8a3)

Using an exact line search, a3 = 1/7 and x3T = (0,0). x3 is the minimum of the quadratic function, and the procedure ends.

 

     If the function in the above example had not been quadratic, the procedure would have continued using s1T = (0,1) and s2T = (-1 3/4, -7/8), i.e., the direction (x2 - xo) for the second cycle. In the third cycle, s1 would be replaced by s2 and s2 would be replaced by the new acceleration direction. The cycles are repeated until a stopping criterion is met.

     Powell(57) has pointed out that this procedure required modification if the acceleration directions become close to being linearly dependent. He reported that this possibility has been found to be serious if the function depended on more than five variables. Powell developed a test that determined if the new conjugate direction was to replace one of the existing directions or if the iterative cycle, steps 1-3, was to be repeated with the existing set of linearly independent directions. If the reader plans to use this procedure Powell's paper(57) should be examined for the details of this test which was said to be essential to minimize a function of twenty independent variables.

     This method has been called one of the more efficient and reliable of the direct search methods(15). The reason is its relative simplicity and quadratic termination property. The method, sectioning, which does not employ the acceleration step but just searches the coordinate axes one at a time, is not as effective and can be confounded by resolution ridges.

     The conjugate gradient method, gradient partan, has proved to be as effective as Powell's method. It is an extension of gradient search and has the ability to locate the optimum of a function with ellipsoidal contours (quadratic termination) in a finite number of steps. The term partan comes from a class of search techniques that employ parallel tangents(12). These methods move in conjugate directions; or in the case of gradient partan, they move in the direction of conjugate gradients. The procedure is diagrammed in Figure 6-1, and this shows that the gradient line is perpendicular to the contour tangent. Thus, the method can begin directly from the starting point as described below.

     For two variables the procedure employs two gradient searches followed by an acceleration step, as shown in Figure 6-1, for a function with elliptical contours. The acceleration line passes through the optimum. The equations for the gradient and acceleration lines for this method are:

Gradient line:   xk+1 = xk + ay(xk)                                      (6-25)

Acceleration : xk+1 = xk-3 + a(xk - xk-3)                               (6-26)

For more than two variables the following diagram shows the sequence of gradient searches and acceleration steps required for a function with ellipsoidal contours.

_____________________________________________________________________________

Gradient Partan Algorithm for a Function with Ellipsoidal Contours

Number of Independent Variables

__________________________________________________

Start:             x0                      2                         3                          4                           n

Gradient:       x0 x2

Gradient:       x2 x3                 x4 x5             x6 x7               x2n-2 x2n-1

Accelerate:    x0 x3 x4        x2 x5 x6    x4 x7 x8     x2n-4 x2n-1 x2n

_____________________________________________________________________________

To have the recursion relation shown above, it is necessary to omit a point numbered x1.


     As shown in the above diagram for a function of n independent variables with ellipsoidal contours, a total of n gradient measurements and (2n-1) exact line searches are required to arrive at the optimum point x2n. The search begins at x0, and a search along the gradient line locates point x2. This is followed by another search along the gradient line to arrive at point x3. Then an acceleration step is performed from point x0 through x3 to arrive at point x4, the optimum of a function with elliptical contours. For n independent variables the procedure continues by repeating gradient searches and accelerations to arrive at point x2n, the optimum of a function of n independent variables having ellipsoidal contours. This procedure is illustrated in the following example for a function with three independent variables. In this case the optimum will be reached with three gradient measurements and five line searches.

Example 6-6(10)

Determine the minimum of the following function using gradient partan starting at the point x0 = (2, -2, 1)

y = 2x12 + x22 + 3x33

Beginning with a gradient search from point x0 to point x2, equation (6-7) is used.

x = x0 + ay(x0)

or

x1 = 2 + 8a

x2 = -2 - 4a

x3 = 1 + 6a

where:

Ex6-6a.jpg (6265 bytes)

Performing an exact line search along the gradient from x0 gives:

y = 2(2 + 8a)2 + (-2 - 4a)2 + 3(1 + 6a)2

Setting dy/da = 0 to locate the minimum of y along the gradient line gives:

 Ex6-6b.jpg (4454 bytes)

Solving for the optimum value of gives a* = - 0.2302. Using a* to compute x2 gives (0.1584, -1.079, -0.3810)T, and the gradient line at x2 is:

x1 = 0.1584 + 0.6336a

x2 = -1.079 - 2.158a

x3 = -0.3810 - 2.287a

Performing an exact line search along the gradient gives:

y = 2(0.1584 + 0.6336a)2 + (-1.079 - 2.158a)2 + 3(-0.3810 - 2.287a)2

Setting dy/da = 0 and solving gives a* = -0.2432. Computing x3 gives (0.0043, -0.5543, 0.1750)T. Accelerating from x0 through x3 to locate x4 gives:

x = x0 + a(x3 - x0)

or

x1 = 2 - 1.996a

x2 = -2 + 1.446a

x3 = 1 - 0.8250a

Performing a search along the acceleration line gives:

y = 2(2 - 1.996a)2 + (-2 + 1.446a)2 + 3(1 - 0.8250a)2

Setting dy/da = 0 and solving gives a* = 1.1034 . Computing x4 gives (-0.2021, -0.4048, 0.0897)T.

     The procedure is continued with a gradient search from x4 to x5 and an acceleration from x2 through x5 to x6, the optimum. The following tabulates the results of these calculations and the previous ones.

Ex6-6c.jpg (27127 bytes)

The parameter of the gradient line is negative showing that the procedure is moving in the direction of steep descent. The parameter of the acceleration line is greater than one showing the new point lies beyond the last point.

     This procedure has been used succesfully on numerous problems. However, it has been referred to as a "rich man's optimizer" by Wilde(10). The method tends to oscillate on problems with sharp curving ridges, and numerical computation of the gradient requires more computer time and storage than some other methods. The two equations used, the gradient and acceleration lines, are simple and easy to program; and the method will find better values in each step toward the optimum.

     For those interested in a detailed discussion of conjugate gradient and direction methods, the books by Fletcher(4), Gill, et al.(6), Avriel(9), Himmelblau(8), Reklaitis et al.(15) and Wilde and Beightler(12) are recommended. Now, we will examine another class of methods that rely on logical algorithms to move rapidly from the starting point to one near an optimum.

Logical Methods top

These procedures use algorithms based on logical concepts to find a sequence of improved values of the economic model leading to an optimum. They begin with local exploration, and then attempt to accelerate in the direction of success. Then if a failure occurs in that direction, the method repeats local exploration to find another direction of improved values of the economic model. If this fails, the algorithm's logic may then try other strategies including a quadratic fit of the economic model (end game) to look for better values. Typically, these procedures do not require derivative measurements, and the algorithm compares the computed values of the economic model. Thus, they are sometimes called function comparison methods.

     Two of the better known methods are pattern search(12) and the polytope or simplicial method(6). Both have been used successfully on a number of problems. Pattern search is probably the more widely used of the two procedures, and it will be discussed in more detail. The polytope method performs local explorations at the vertices of an n-dimensional generalization of an equilateral triangle and can employ an acceleration step based on these results. The details of this method and extensions are given by Gill, et al.(6).

     The logical algorithm of pattern search is illustrated in Figure 6-3, and it begins with short excursions from the starting point to establish a pattern of improved values of the economic model. Based on these function comparisons, it accelerates in the direction established from the local explorations. If successful, the acceleration is continued. Then when a failure is encountered, i.e., a value of the economic model is less than the previous one, the pattern is said to be destroyed; and local explorations are performed to establish a new pattern of improved values of the economic model. Again, acceleration is performed in the new direction until a failure is encountered. The procedure continues in this fashion until an apparent optimum is reached. Then the step size of the local exploration is reduced, attempting to find another direction of improvement in the economic model. If this is successful, the procedure continues until another optimum is found. Reducing the step size is repeated; and if this is unsuccessful in finding a new direction, the current point is declared a local optimum. However, a quadratic fit at the point is needed to confirm that it is an optimum rather than a saddle point.

     The algorithm has two parts. One is the local exploration procedure, and the other is the acceleration step. The local explorations are performed about a base point by perturbing one variable at a time. Each time a variable is perturbed and a better value of the economic model is found, this point is used when the next variable is changed rather than returning to the original point. These are called temporary heads and the first one t11 is computed by the following expression.

Equ6-27.jpg (8706 bytes)

where b1 is the starting point, d1 = (d1, 0, ... 0), and the first subscript on t11 refers to the pattern number and the second subscript refers to the coordinate axis of the variable being perturbed. For coordinate axis x2 the perturbations are conducted around point t11 to locate point t12, and equation corresponding to (6-27) above for the coordinate axis x j is :

Equ6-28.jpg (9886 bytes)

When these perturbations and evaluations are performed for each of the coordinate axes, a final point t1,n is located. This point is designated b2, and an acceleration move is made in the direction established by the local exploration. This is given by the following equation and locates point t20.

t20 = b1 + 2(b2 - b1) = b2 + (b2 - b1)                                             (6-29)

Now, point t20 is used as the starting point for local exploration following the same procedure using equation (6-27) and (6-28) to locate point b3. Then the acceleration step is repeated using the same form of equation (6-27) to locate t30.

t30 = b2 + 2(b3 - b2) = b3 + (b3 - b2)                                             (6-30)

The search grows with repeated success.

     At this point the two parts of the algorithm have been described in a general form. The local exploration step and the acceleration step can be readily implemented in a computer program, and one is given by Kuester and Mize(16). In addition, the following example illustrates the method on the contour diagram of a function of two independent variables shown in Figure 6-3. It shows the local exploration, acceleration, pattern destroyed and reestablished, and location of the optimum.

Example 6-7

Locate the maximum of the function shown in Figure 6-3 using pattern search starting at the points indicated as b1.

     To begin, local explorations are performed by moving in the positive coordinate axis direction first (open circles indicate failures; and solid circle indicate successes). On the x1 axis the largest of y(x1, x2) is at t11. Then perturbing on the x2 axis locates the largest value of y at t12 = b2. Effort is not wasted by evaluating y in the negative direction on the x2 axis. Next, an acceleration is performed using equation (6-27) to locate point t20. Then local exploration determines point b3 and acceleration using equation (6-28) locates point t30. Local exploration locates point b4, and the acceleration step increases and changes directions as a result of the outcomes from the local exploration at t30 to reach point t40. Local exploration determines point b5, and acceleration gives point t50. However, y(t50) < y(b5); and the pattern is said to be destroyed. Local exploration is repeated, and b6 is located. This sequence of local exploration is repeated determining points t60, b7, t70, b8, t80, b9, and t90. However, y(t90) < y(b9) and the pattern is destroyed. Local exploration is repeated to locate b10 and acceleration is to t10,0. However, local exploration around t10,0 shows that this point has the largest value of y and t10,0 = b11. Then the procedure would reduce the step-size to attempt to find a direction of improvement. Although this is not shown in Figure 6-3, the outcome would be that y(b11) is still the largest value. Point b11 would be declared a local maximum, and a quadratic fit to the function could be performed to confirm the maximum. The pattern search steps are summarized on Figure 6-3.

     Pattern search has been used successfully on a number of types of problems, and it has been found to be most effective on problems with a relatively small number of independent variables, e.g., 10 or fewer. It has the advantage of adjusting to the terrain of a function and will follow a curving ridge. However, it can be confounded by resolution ridges (12), and a quadratic fit is appropriate to avoid this weakness.

     There are a number of other methods based on logical algorithms. These are discussed in some detail in the texts by Himmelblau(8), Gill et al.(6), and Reklaitis et al.(15). However, none of these are superior to the ones discussed here. Now, we will turn our attention to methods used for constrained multivariable search problems and see that the DFP and BFGS procedures are an integral part of some of these methods.

top