MULTI VARIABLE OPTIMIZATION PROCEDURES
There are essentially six types of procedures to solve constrained nonlinear optimization problems. The three considered most successful are successive linear programming, successive quadratic programming and the generalized reduced-gradient method. The other three have not proved as useful, especially on problems with a large number of variables (more than twenty). These are penalty and barrier function methods, augmented Lagrangian functions and the methods of feasible directions (or projections) which are sometimes called methods of restricted movement. Of these methods only successive linear programming does not require an unconstrained single or multivariable search algorithm. Also, penalty and augmented function methods have been used with successive quadratic programming. Each of these methods will be discussed in the order that they were mentioned. This will be followed by a review of studies which have evaluated the performance of the various methods.
This procedure was called the method of approximate programming (MAP) by Griffith and Stewart(18) of Shell Oil Company who originally proposed and tested the procedure on petroleum refinery optimization. As the name implies, the method uses linear programming as a search technique. A starting point is selected, and the nonlinear economic model and constraints are linearized about this point to give a linear problem which can be solved by the Simplex Method or its extensions. The point from the linear programming solution can be used as a new point to linearize the nonlinear problem, and this can be continued until a stopping criterion is met. As shown by Reklaitis et al.(15), this procedure works without safeguards for functions that are mildly nonlinear. However, it is necessary to bound the steps taken in the iterations to insure that the economic model improves, the values of the independent variables remain in the feasible region and the procedure converges to the optimum. These safeguards are bounds on the independent variables specified in advance of solving the linear programming problem. The net result is that the bounds are additional constraint equations. If the bounds are set too small, the procedure will move slowly toward the optimum. If they are set too large, infeasible solutions will be generated. Consequently, logic is incorporated into computer programs to expand the bounds when they hamper rapid progress and shrink them so that the procedure may converge to a stationary point solution(1). For successive linear programming, the general nonlinear optimization problem can be written as:
where upper and lower limits are shown specifically on the independent variables. Now the economic model y(x) and the constraints fi(x) can be linearized around a feasible point xk to give: The problem is in a linear programming format in the form of equation (6-32). However, the values of Dxj can take on either positive or negative values depending on the location of the optimum. Negative values for Dxj are not acceptable with the Simplex Algorithm so a change of variables was made by Griffith and Stewart(18) as follows.
Substituting equation (6-33) into equation (6-32), now the linear programming problem has the form: The bounds on the upper and lower limits on the variables are specified by (uj - xjk) and (lj - xjk). The value of the next point for linearizing is given by xjk+1 = xjk + Dxj+ - Dxj¯, and the procedure is started by specifying a starting point xo(k=0). The above equations are now a linear programming problem where the independent variables are Dxj+ and Dxj¯. The value of the bound uj and lj may affect the rate of convergence of the algorithm. The use of bounds is illustrated in the following example given by Griffith and Stewart(18). Example 6-8(18) Locate the maximum of the following constrained nonlinear optimization problem by the method of succesive linear programming starting at xo(1, 1), and using the bounds (uj - xjk) = (xjk - lj) = 1.
The two constraint equations are shown in Figure 6-4 where they intersect at the maximum of the economic model, point x*(4,3). For this problem the successive linear programming approximation is obtained using equation (6-34).
There are four variables in the above equations Dx1+, Dx2+, Dx1¯, and Dx2¯. Starting at point xo (1,1) the above equations become:
Solving by the Simplex Method gives:
x1 is computed as follows:
and
Linearizing around x1(2,2) gives:
Solving by the Simplex Method gives:
x2 is computed as follows:
and
Linearizing around x2(3,3) gives:
Solving by the Simplex Method gives:
x3 is computed as follows:
and
Linearizing around x3(4,3 1/6) gives:
Solving by the Simplex Method gives:
x is computed as follows:
and
Linearizing around x4(4.0,3.0044) gives:
Solving by the Simplex Method gives:
x5 is computed as follows:
and
This is the optimal solution and is the same as given by Griffith and Stewart (18). It should be noted that point x3(4,3 1/6) is an infeasible point and does not satisfy the first constraint equation. However, this point is sufficiently close to the optimum that the method converges to the optimum after linearizing around this point. Convergence to the optimum will not take place if bounds are not used, however. The problem was solved without the constraints bounding the variables. Starting at point x0(1,1) the point x1(8.5,5.0) was found. Linearizing around x1(8.5,5.0) and solving by the Simplex Method gave the point x2(0,12.23). Then linearizing around x2(0,12.23) gave a set of constraint equations that had an unbounded solution. Consequently, bounds were required on this problem to ensure convergence to a solution. Computer programs will reduce the bounds when an infeasible solution is located, and resolve the problem. This was done for the problem starting at point x2(3,3) since point x3(4,3 1/6) was infeasible, and the bounds were reduced by one-half each time an infeasible point was obtained. Following this procedure, the next two iterations for this problem were (3.563, 3.492) and (3.595, 3.475). Further examination showed the method had difficulty following the first constraint to the optimum. As Himmelblau(8) points out, when constraints become active then successive linear programming's "progress becomes quite slow." Consequently, logic is incorporated in some programs to allow the procedure to continue from an infeasible point, as was done by Griffith and Stewart in this example. For those interested in having a successive linear programming code, Lasdon(19) reports that the most widely used and best known one, POP (Process Optimization Procedure) is available from the SHARE library (COSMIC, Bartow Hall, University of Georgia; Athens GA 30601). Other listings of sources of optimization codes are given by Sandgren(20) and Lasdon and Waren (22). Large linear programming codes have been used with large simulation models in an iterative fashion to approximate the nonlinearities in these models. This approach of using linear programming successively has been successful in large plants. In most cases, this procedure has been used by companies that have many man-years of effort in the development and use of a large linear programming code for plant optimization and a corresponding amount of effort in large simulations of key process units for prediction of performance and yields. An example of this is in petroleum refining where linear programming is used for refinery optimization. In addition, elaborate simulations and correlations have been developed for processes such as catalytic cracking, reforming and distillation. As discussed in Chapter 4, the results of a linear programming optimization are as accurate as the parameters in the economic model and constraint equations, c, A and b. As shown in Figure 6-5 iterative procedures have been developed that use these programs together. The large simulation codes are used to compute the parameters used in the large linear programming code. Then the linear programming code is used to generate an optimal solution in terms of the independent variables, x, which are the process variables required by the simulation codes. This iteration procedure is continued until a stopping criterion is met. Both the linear programming code and the process simulators are very large programs, and no attempt is made to have them run at the same time. Typically, the output from the simulators is edited by a separate program to produce a data set in the form required by the linear programming code. Also, another program can be used to manipulate the output from the linear programming code into a data set for use by the simulation programs. Further descriptions of these procedures are given by Pollack and Lieder(31) for petroleum refinery optimization and by O'Neil, et al.(32) for the allocation of natural gas in large pipeline networks.
Like successive linear programming, a quadratic programming problem is formed from the nonlinear programming problem and is solved iteratively until an optimum is reached. However, the iterative procedure differs from that of successive linear programming. As described by Lasdon and Waren(22), the quadratic programming solution is not accepted immediately as the next point to continue the search, but a single variable search is performed between the old and new points to have a better and feasible point. In quadratic programming the economic model is a quadratic function, and the constraints are all linear equations. To solve this problem the Lagrangian function is formed, and the Kuhn-Tucker conditions are applied to the Lagrangian function (23,24,25) to obtain a set of linear equations. This set of linear equations can then be solved by the Simplex Method for the optimum. It turns out that artificial variables are required for part of the constraints and the slack variables can be used for the other constraints to have an initially feasible basis. Also, finding an initial basic feasible solution may be the only feasible solution(25), so the linear programming computational effort is minimal. At this point it is important to understand the solution of a quadratic programming problem, and this procedure will be described next and illustrated with an example. Then the successive quadratic programming algorithm will be described and illustrated with an example. Also, modifications of the procedure will be discussed that reduce the computational effort in numerically evaluating the Hessian matrix which must be obtained from the nonlinear programming problem. Theoretically, using a quadratic function to approximate the nonlinear economic model of the process can be considered superior to a linear function to represent the economic model. This is part of the motivation for using quadratic programming which can be represented by the following equations: where qjk = qkj would be the second partial derivatives with respect to xj and xk of the nonlinear economic model. They would be computed numerically or analytically from the nonlinear problem given by equation (6-31). Also, cj and aij would be computed as shown by equation (6-32) either numerically or analytically from the nonlinear problem, equation (6-31). The quadratic programming procedure begins by adding slack variables xn+i to the linear constraint equations. It will not be necessary to use xn+i2, since the problem will be solved by linear programming, and all of the variables must be positive or zero. The Lagrangian function is formed as follows: Positive Lagrange multipliers are required, so a negative sign is used on the term with the constraint equations. (See equation 2-49). Setting the first partial derivatives of the Lagrangian function with respect to xj and i equal to zero give the following set of (n+m) linear algebraic equations: The inequality form of the Kuhn - Tucker conditions, equation 6-38, is used to account for xj > 0. (See Hillier and Lieberman(25), and Hadley (59)). Also, the complementary slackness conditions must be satisfied, i.e. product of the slack variables xn+i and the Lagrange multipliers i are zero.
If xn+i = 0, then the constraint is active, an equality; and li ¹ 0. However, if xn+i ¹ 0, then the constraint is inactive, an inequality; and li = 0. For more details refer to the discussion in Chapter 2. The equation set of (6-38) and (6-39) can be converted to a linear programming problem in the following way. Surplus variables are added to equation (6-38) as sj, and slack variables have been added to equation (6-39) as xn+i. The slack variables xn+i can serve as the variables for an initially feasible basis for equations (6-39). However, artificial variables are required to have an initially feasible basis for equation (6-38). Adding artificial variables zj with a coefficient cj to equations (6-38) is a convenient way to start with zj = 1. Also, the objective function will be to minimize the sum of the artificial variables to ensure that they will not be in the final optimal solution. As a result of these modifications, equations (6-38) and (6-39) become: This is now a linear programming problem which can be solved for optimal values of x and l, the solution of the quadratic programming problem. In addition, the solution must satisfy x > 0, l > 0 and lixn+i = 0. Consequently, the Simplex Algorithm has to be modified to avoid having both li and xn+i be basic variables, i.e., nonzero, to satisfy the complimentary slackness conditions(26). This may require choosing the second, best variable to enter the basis in proceeding with the Simplex Algorithm if either li or xn+i are in the basis and the other one is to enter. Franklin(23) has given uniqueness and existence theorems that prove the above procedure is the solution to the quadratic programming problem and is a recommended reference for those details. At this point the method is illustrated with an example. Example 6-9(25) Using quadratic programming determine the maximum of the following function subject to the constraints given.
The quadratic programming problem is constructed using equation (6-41) with c1 = 5, c2 = 1, q11 = 2, q12 = -2, q21 = -2, q22 = 2, a11 = 1, a12 = 1 and b1 = 2. The linear programming problem from equation (6-41) is:
Eliminating z1 and z2 from the objective function gives the following set of equations for the application of the Simplex Method:
this quadratic programming problem is:
The positive Lagrange multiplier is consistent with the Kuhn-Tucker conditions for a maximum, equation (2-49), since a negative sign was used in equation (6-36). Successive quadratic programming iteratively solves a nonlinear programming problem by using a quadratic approximation to the economic model and a linear approximation to the constraint equations. As the series of quadratic programming problems are solved, these intermediate solutions generate a sequence of points that must remain in the feasible region or sufficiently close to this region to converge to the optimum. The logic used with this method is to search along the line between the new and previous point to maintain a feasible or near feasible solution. Also, the computational effort in evaluating the Hessian matrix is significant, and quasi-Newton approximations have been used to reduce this effort. The following example illustrates successive quadratic programming for a simple problem. The discussion that follows describes modifications to the computational procedure to improve the efficiency of the method. Example 6-10 Solve the following problem by successive quadratic programming starting at point x0(0,0).
The contours of the economic model and the constraint equations are shown in Figure 6-6. The nonlinear constraint equation is linearized about the point xk, and it has the following form.
Placing the problem in the form of equation 6-35, gives:
The quadratic programming problem is constructed using equation (6-41) with c1 = 2, c2 = 4, q11 = 2, q12 = q21 = 0, q22 = 2, a11 = (0.208x1k - 0.75), a12 = 1, a21 = 1, a22 = 1, b1 = 0.85 + 0.104x1k, and b2 = 4:
Solving the above linear programming problem by the Simplex Method with xo = (0,0) and ensuring that the complementary slackness conditions are met gives the following result for xo*.
This point is shown on Figure 6-6, and it is outside the feasible region. Consequently, a search along the line between the starting point xo(0,0) and x*(1.192, 1.740) locates feasible point x1(1.039, 1.517) on the first constraint. The quadratic programming problem is formulated around point x1 and solved as was done above. The result is x2(1.209, 1.608) with l1 = 0.784. Repeating the procedure by solving the quadratic programming problem at x2 gives the value of x3(1.199, 1.600) with l1 = 0.800. This is sufficiently close to the optimum of the problem x*(1.2, 1.6) for the purposes of this illustration to say that a converged solution has been obtained. The Wilson-Han-Powell method is an enhancement to successive quadratic programming where the Hessian matrix, [qjk] of equation (6-35), is replaced by a quasi-Newton update formula such as the BFGS algorithm, equation (6-20). Consequently, only first partial derivative information is required, and this is obtained from finite difference approximations of the Lagrangian function, equation (6-36). Also, an exact penalty function is used with the line search to adjust the step from one feasible point to the next feasible point. The theoretical basis for this algorithm is that it has a superlinear convergence rate if an exact penalty function is used with the DFP or BFGS update for the Hessian matrix of the Lagrangian function, and global convergence is obtained to a Kuhn-Tucker point when minimizing an economic model which is bounded below and has convex functions for constraint equations. The details and proofs are given by Han (51,52). The problem in example 6-10 was solved with the Wilson-Han- Powell algorithm. The identity matrix was used for the Hessian matrix at the starting point xo(0,0). The subsequent steps in the solution were x1 (1.3803, 1.6871), x2 (1.203, 1.6038), and x3 (1.1988, 1.603), which was sufficiently close to the optimum to stop. Generally, less computational effort is required with the Wilson-Han-Powell algorithm since second order partial derivatives do not have to be evaluated. The Exxon quadratic programming code(1) uses the Wilson-Han-Powell algorithm described above, and they have added refinements to minimize the computational effort in evaluating the second partial derivatives of the Hessian matrix. This typical large quadratic programming code is described as having the following steps of basic logic. An initial starting point is selected, and the linearized constraints are constructed numerically. Then the matrix of second partial derivatives, the Hessian matrix, is evaluated either numerically or a DFP (Davidon, Fletcher, Powell) approximation can be used. The quadratic programming problem is solved generating a new optimal point. Using this new point and the old point, a single variable search is conducted for an improved, feasible solution to the nonlinear problem. This is followed by termination tests using the Kuhn-Tucker conditions, changes in step and function values and feasibility checks. Some options included in the program include using analytical derivatives when furnished, inputting the Hessian matrix by the user or having it be a specified multiple of the identity matrix with up-dating by the DFP algorithm, and having the user specify whether or not intermediate solutions are required to be feasible. In closing this section, it should be mentioned that the Wilson- Han-Powell (WHP) method has been used successfully on computer-aided process design problems, as described by Jirapongphan, et al.(42), Vanderplaats(24) and Biegler and Cuthrell(53). In some applications, the constraint equations are not converged for each step taken by the optimization algorithm, but an infeasible trajectory is followed where the constraints are not satisfied until the optimum is reached. In the line search to adjust the step from one point to the next, an exact penalty function is used. A step length parameter is employed with the penalty function to force convergence from poor starting conditions. The size of the quadratic programming problem is reduced by substituting the linearized equality constraint equations into the quadratic economic model leaving only the inequalities as constraints. The result can be a significant reduction in the number of the independent variables for highly constrained problems. This technique has been included in comparisons of constrained nonlinear programming methods which are described in a subsequent section. It has been shown to be one of the three better procedures. Now, the equally successful procedure called the generalized reduced gradient method is described.
This procedure is one of a class of techniques called reduced-gradient or gradient projection methods which are based on extending methods for linear constraints to apply to nonlinear constraints(6). They adjust the variables so the active constraints continue to be satisfied as the procedure moves from one point to another. The ideas for these algorithms were devised by Wilde and Beightler(12) using the name of constrained derivatives, by Wolfe(29) using the name of the reduced-gradient method and extended by Abadie and Carpenter(30) using the name generalized reduced gradient. According to Avriel(9) if the economic model and constraints are linear this procedure is the Simplex Method of linear programming, and if no constraints are present it is gradient search. The development of the procedure begins with the nonlinear optimization problem written with equality constraints. The necessary slack and surplus variables have been added as xs or xs2 to any inequality constraints, and the problem is:
Again, there are m constraint equations and n independent variables with n>m. Also, although not specifically written above, the variables can have upper and lower limits; and the procedure as described here will ensure that all variables are positive or zero. The idea of generalized reduced gradient is to convert the constrained problem into an unconstrained one by using direct substitution. If direct substitution were possible it would reduce the number of independent variables to (n-m) and eliminate the constraint equations. However, with nonlinear constraint equations, it is not feasible to solve the m constraint equations for m of the independent variables in terms of the remaining (n-m) variables and then to substitute to these equations into the economic model. Therefore, the procedures of constrained variation and Lagrange multipliers in the classical theory of maxima and minima are required. There, the economic model and constraint equations were expanded in a Taylor series, and only the first order terms were retained. Then with these linear equations, the constraint equations could be used to reduce the number of independent variables. This led to the Jacobian determinants of the method of constrained variation and the definition of the Lagrange multiplier being a ratio of partial derivatives as was shown in Chapter 2. The development of the generalized reduced gradient method follows that of constrained variation. The case of two independent variables and one constraint equation will be used to demonstrate the concept, and then the general case will be described. Consider the following problem.
Expanding the above in a Taylor series about a feasible point xk(x1k,x2k) gives: Substituting equation (6-44b) into equation (6-44a) to eliminate x2 gives, after some rearrangement: In equation (6-45) the first two terms are known constants being evaluated at point xk, and the coefficient of (x1 - x1k) is also a known constant and gives the x1 the direction to move. Thus, to compute the stationary point for this equation, dy/dx1 = 0; and the result is the same as for constrained variation, equation (2-18), which is the term in the brackets of equation (6-45) that is solved together with the constraint equation for the stationary point. However, the term in the bracket also can be viewed as giving the direction to move away from xk to obtain improved values of the economic model and satisfy the constraint equation. The generalized reduced gradient method uses the same approach as described for two independent variables, which is to find an improved direction for the economic model and also to satisfy the constraint equations. This leads to an expression for the reduced gradient from equation (6-42). To develop this method the independent variables are separated into basic and nonbasic ones. There are m basic variables xb, and (n-m) nonbasic variables xnb. In theory the m constraint equations could be solved for the m basic variables in terms of the (n-m) nonbasic variables, i.e.:
Indicating the solution of xb in terms of xnb from equation (6-46) gives:
The names basic and nonbasic variables are from linear programming. In linear programming the basic variables are all positive, and the nonbasic variables are all zero. However in nonlinear programming, the nonbasic variables are used to compute the values of the basic variables and are manipulated to obtain the optimum of the economic model. The economic model can be thought of as a function of the nonbasic variables only, that is if the constraint equations (6-47) are used to eliminate the basic variables, i.e.:
Expanding the above equation in a Taylor series about xk and including only the first order terms gives: In matrix notation equation (6-49) can be written as:
This equation is comparable to equation (6-44a). A Taylor series expansion of the constraint equations (6-46) gives an equation that can be substituted into equation (6-50) to eliminate the basic variables. or in matrix form equation (6-51) is:
The following equation defines Bb as the matrix of the first partial derivatives of fi associated with the basic variables and Bnb as the matrix associated with the nonbasic variables, i.e.:
This is a convenient form of equation (6-52) which can be used to eliminate dxb from equation (6-50). Solving equation (6-53) for dxb gives:
Substituting equation (6-54) into equation (6-50) gives:
Eliminating dxnb from equation (6-55), the equation for the reduced gradient Y(xk) is obtained.
Knowing the values of the first partial derivatives of the economic model and constraint equations at a feasible point, the reduced gradient can be computed by equation (6-57). This will satisfy the economic model and the constraint equations. The generalized reduced gradient method uses the reduced gradient to locate better values of the economic model in the same way unconstrained gradient search was used, i.e.:
where is the parameter of the line along the reduced gradient. A line search on is used to locate the optimum of Y(xnb) along the reduced gradient line from xk. In taking trial steps as a is varied along the generalized reduced gradient line, the matrices Bb and Bnb must be evaluated along with the gradients Ñyb(xb) and Ñynb(xk). This requires knowing both xnb and xb at each step. The values of xnb are obtained from equation (6-58). However, equation (6-46) must be solved for xb; and frequently, this must be done numerically using the Newton - Raphson method. As pointed out by Reklaitis et al.(15) most of the computational effort can be involved in using the Newton - Raphson method to evaluate feasible values of the basic variables once the nonbasic variables have been computed from equation (6-58). The Newton - Raphson algorithm in terms of the nomenclature for this procedure is given by the following equation.
where the values of the roots of the constraint equations (6-46) are being sought for xb, having computed xnb from equation (6-58). Thus, the derivatives computed for the generalized reduced gradient's Bb matrix can be used in the Newton - Raphson root seeking procedure also. The following example illustrates the generalized reduced gradient algorithm. It is a modification and extension of an example given by Reklaitis, et al.(15). Example 6-11(15) Solve the following problem by the generalized reduced gradient method starting at point xo(0,0). The constrained minimum is located at (1.2, 1.6) as shown in Figure 6-7.
Solution: The problem is placed in the generalized reduced gradient format, equation (6-42).
where x3 and x4 have been added as slack variables. To begin x1 and x2 are selected to be basic variables, and x3 and x4 to be nonbasic variables. The equation for the generalized reduced gradient is equation (6-57) and for this problem is: Computing the values of the partial derivative gives: The generalized reduced gradient equation becomes: where The equation for the generalized reduced gradient through xo (0,0) is: The generalized reduced gradient line through starting point xo (0,0,2,4) is given by equation (6-58) and for this example is:
A line search is required. The equations for x1 and x2 are needed in terms of x3 and x4 to be able to evaluate dy/da since y = y(x1,x2). Solving the constraint equations for x1 and x2 in terms of x3 and x4 gives:
Substituting to have x1 and x2 in terms of gives:
Substituting into y gives:
Locating the minimum along the reduced gradient line:
Solving for x1, x2, x3 and x4 gives:
The location of point x1 (1.608, 1.149, 1.311, 1.243) is shown in Figure 6-7. Also, the constraint equations are satisfied. Now, repeating the search starting at x1 gives the following equation for the reduced gradient: The equations for x1, x2, x3 and x4 in terms of the parameter of the reduced gradient line are now computed as:
Using the above equations, the minimum along the reduced gradient line is located by an exact line search.
Setting dy/d equal to zero and solving for gives:
With this value of a, the values for x1, x2, x3 and x4 are:
The point x2(0.781, 1.563, -0.348, 1.657) is an infeasible point as shown in Figure 6-7. The first constraint is violated (x3 = -0.348). This constraint is active, an equality; and the value of a has to be reduced to have the slack variable x3 be equal to zero, i.e.:
Recalculating x1, x2 and x4 for a = -1.347 gives:
and the point to continue the next reduced gradient search is x2 = (0.955, 1.476, 0, 0.157). Now ¶f1/¶x3 = 0 in the reduced gradient equation. The reduced gradient equation at x2 becomes: Solving gives: The reduced gradient line is determined as was done previously:
In this case the reduced gradient line search will be along the first constraint since it is now an equality constraint (x3 = 0). Solving for the optimal value of a gives:
Setting dy/da = 0 gives a = -0.890. Then solving for x1, x2 and x4 gives:
The point x3 (1.20, 1.60, 0, 1.20) from the reduced gradient search is the minimum of the function as shown in Figure 6-7. A summary of the steps is as follows:
The point x3 (1.20, 1.60, 0, 1.20) is the minimum of the function as shown in Figure 6-7.
The texts by Reklaitis et al.(15), Himmelblau(8), and Avriel(9) are recommended for information about additional theoretical and computational details for this method. These include procedures to maintain feasibility, i.e., the GRG, GRGS and GRGC versions, stopping criteria, relation to Lagrange multipliers, treatment of bounds and inequalities, approximate Newton - Raphson computations, and use of numerical derivatives, among others. In the first comprehensive comparison of nonlinear programming codes which was conducted by Colville(21), the generalized reduced gradient method ranked best among 15 codes from industrial firms and universities in this country and Europe. This algorithm has been a consistently successful performer in computer programs implementing it to solve industrial problems. Lasdon(2) reported that he has a GRG code available for distribution (Professor L. S. Lasdon, School of Business Administration, University of Texas, Austin, Texas 78712), and this article lists several other sources of GRG codes.
These methods convert the constrained optimization problem into an unconstrained one. The idea is to modify the economic model by adding the constraints in such a manner to have the optimum be located and the constraints be satisfied. There are several forms for the function of the constraints that can be used. These create a penalty to the economic model if the constraints are not satisfied or form a barrier to force the constraints to be satisfied, as the unconstrained search method moves from the starting point to the optimum. This approach is related to the method of Lagrange multipliers which is a procedure that modifies the economic model with the constraint equations to have an unconstrained problem. Also, the Lagrangian function can be used with an unconstrained search technique to locate the optimum and satisfy the constraints. In addition, the augmented Lagrangian function combines a penalty function with the Lagrangian function to alleviate computational difficulties associated with boundaries formed by equality constraints when the Lagrangian function is used alone. These penalty function type procedures predate the previously discussed methods, and have been supplanted by them. They have proved successful on relatively small problems, but the newer techniques of successive linear and quadratic programming and generalized reduced gradient were required for larger, industrial-scale problems. However, the newer techniques have incorporated these procedures on occasions to ensure a positive definite Hessian matrix and to combine the equality constraints with the profit function which then leaves only the inequalities as constraints. The following paragraphs will review and illustrate these methods since they are used in optimization codes and as additions to the newer methods. More detail is given in the texts by Avriel(9), Reklaitis, et al.(15), and Gill, et al.(6) and in the review by Sargent(33). The penalty function concept combines two ideas. The first one is the conversion of the constrained optimization problem into an unconstrained problem, and the second is to have this unconstrained problem's solution be one that forces the constraints to be satisfied. The constraints are added to the economic model in a way to penalize movement that does not approach the optimum of the economic model and also satisfy the constraint equations. The optimization problem can be written with equality and inequality constraints as:
By combining the economic model and constraint equations, we can form a penalty function as follows:
where the term F[r,f(x)] is a function notation that includes the constraint equations and a penalty function parameter r as variables. Various forms of this function F have been suggested and used with varing degrees of success. Some of these are given in Table 6-1. Referring to the table we see that these functions are of two types, interior and exterior penalty functions. The interior penalty function requires a feasible starting point, and each step toward the optimum is a feasible point. An example of an interior penalty function with an economic model subject to inequality constraints is: Interior penalty functions are applicable only to inequality constraints, and the term in equation (6-62) with the constraints will increase as feasible points approach the boundary with the infeasible region. Consequently, the function P(x,r) will appear to encounter a barrier at the boundary of the feasible region. Therefore, interior penalty functions are called barrier functions also. The other forms of the interior penalty function shown in Table 6-1 can be used equally as well as the one used for illustration in equation (6-22). The parameter r in equation (6-62) and Table 6-1 is used to ensure convergence to the optimum and have the constraint equation be satisfied. Initially, it has a relatively large value when the search is first initiated. Then, the search is repeated with successively smaller values of r to ensure that the penalty term goes to zero, and at the optimum P(x, r®0) = y(x). This procedure will be illustrated subsequently. The value of r can be selected by trial and error, and normally a satisfactory starting value will be between 0.5 and 50 according to Walsh(26). Also, Walsh(26) reported a formula to compute the value of r which involves evaluating the Jacobian matrix of the economic model and the Jacobian and Hessian matrices of the F function at the starting point. Exterior penalty function forms start at a feasible point; and they can continue toward the optimum, even though infeasible points are generated. An example of an exterior penalty function is: In this form infeasible points may be generated as the unconstrained search method moves. However, if convergence is required using the parameter r, a feasible and optimal solution will be obtained. Exterior penalty functions used for equality constraints can be combined with interior penalty functions for inequality constraints to have what is referred to as mixed interior-exterior penalty functions. The one used successfully by Bracken and McCormick(36) has the form: The following example will illustrate that the penalty parameter r must go to zero to arrive at the optimal solution. After this example, the results of Bracken and McCormick(36) will be summarized to illustrate the procedure of using an unconstrained search technique with a penalty function to locate the optimum of the constrained problem. Example 6-12(37) Form the exterior penalty function for the following problem using the penalty parameter r, and use the classical theory of maxima and minima to locate the minimum. The result will include the parameter r. Show that it is necessary to have r go to zero for the optimal solution of the unconstrained problem (penalty function) be equal to the optimal solution of the original constrained problem.
The penalty function is:
Setting the first partial derivative with respect to x1 and x2 equal to zero gives: Solving for x1 and x2 gives: To have the optimal solution of the penalty function be equal to the optimal solution of constrained problem r must be zero, i.e.:
A solution by Lagrange Multipliers will give these results.
When a search technique is used, a value of r must be selected which is sufficiently large to allow movement toward the optimum. As the optimum is approached, successively smaller values of r must be used to have the optimum of the penalty function approach the optimum of the constrained problem. Bracken and McCormick(36) have illustrated this procedure by solving the problem shown in Figure 6-8. For this problem, a mixed penalty function was selected in the form of equation(6-64). For the unconstrained problem to represent the constrained problem and have the same solution at the optimum, i.e., P(x*,r) = y(x*), the following conditions must be satisfied: along with constraints being satisfied. The computational effort required to meet the above requirements is illustrated by the problem given in Figure 6-8. The search technique SUMT began at starting point xo(2,2) and arrived at the apparent optimum (0.7489,0.5485) with a value of r = 1.0. The search technique was started again at point (0.7489,0.5485) using a value for r of 4.0 x 10-2 to arrive at the apparent optimum (0.8177,0.8323) as shown in the table in Figure 6-8. This procedure was repeated continually reducing the value for r until an acceptable result was obtained for x1 and x2. In this case, the values from one optimal solution to the next agreed to within four significant figures. At this point, the value of r had decreased to 4.096 x 10-9, practically zero for the problem. In summary, significant computational effort is required to ensure that the solution of the penalty function problem approaches the solution to the constrained problem. For the illustration, the optimization problem was solved seven times as r went from 1.0 to 4.096 x 10-9 to have a converged solution of the unconstrained problem to the constrained one. This is typical of what is to be expected when penalty functions are used. The conventional penalty function method obtains the optimal solution only at the limit of a series of solutions of unconstrained problems(33). Consequently, exact penalty functions have been proposed that would give the optimal solution in one application of the unconstrained algorithm. Several exact penalty functions have been constructed(33), but their use has been limited since they contain absolute values which are not differentiable. A procedure corresponding to the penalty function method has used the Lagrangian function. The Lagrangian function is formed as indicated below where the slack and surplus variables have been used for the inequality constraints. In this situation an initial estimate is made for the Lagrange multipliers, and the unconstrained problem given by equation (6-66) is solved for an apparent optimum, x. However, this value of x usually does not satisfy the constraints; and the estimated values of the Lagrange multipliers are adjusted to give a new unconstrained problem which is solved again for the apparent optimum. This procedure is repeated until the optimum is located, and the constraints are satisfied. Methods have been developed to estimate the Lagrange multipliers(33) for this procedure. The following simple example illustrates this idea of having to resolve the unconstrained optimization problem with various values of the Lagrange multipliers until the constraints are satisfied. Example 6-13 Form the Lagrangian function for the following constrained problem and solve it by analytical methods for values of the Lagrange multiplier of -1/2, -1.0 and -2.0. Compare these results with the analytical solution of x1 = x2 = Ö2/2 and l = -Ö2/2.
The Lagrangian function is:
Using l = -1 the Lagrangian function becomes:
Solving by analytical methods gives x1 = 1/2, x2 = 1/2; and using these values in the constraint gives:
The other values are determined in a similar fashion, and the following table summarizes the results.
The value of the Lagrange multiplier goes from -1 to -1/2 as the value of f goes from -1/2 to 1 with the value of f = 0 (constraint satisfied) at l = -Ö2/2. Using the Lagrangian function is similar to using the penalty function in converting a constrained problem into an unconstrained one in the sense that the problem has to be resolved until the unconstrained problem has converged to the solution of the constrained one. There appears to be a disadvantage to using the Lagrangian function because a set of Lagrange multipliers (one for each constraint) has to be adjusted while only one penalty parameter is required. However, it turns out that there are difficulties in implementing penalty functions including discontinuities on the boundaries of the feasible region(11), the Hessian matrix of the penalty function can become ill-conditioned(9) and the distortion of contours as r grows smaller(15). Also, it has been found that using the Lagrangian function alone has been relatively unsuccessful especially for large problems(8), except when the constraints are linear(38). Combining penalty functions and Lagrange multipliers has proved more successful, and this technique is called the augmented Lagrangian method or the method of multipliers(6,9,15), and the relation between the penalty parameter and the Lagrange multipliers has been reported(9,28). The augmented Lagrangian function can be written as follows(11). and an algorithm for updating the Lagrange multipliers has been given by Avriel(11).
Avriel(11) has given an example of the use of this procedure for a simple problem. There have been difficulties associated with this method in the choice of the penalty parameter r. As discussed by Gill, et al.(6), too small a value can lead to an unbounded number of unconstrained searches having to be performed, in addition to a possible ill-conditioned Hessian matrix of the Lagrangian function. As will be seen in the section on comparison of techniques, these methods have not performed as well as successive linear and quadratic programming and the generalized reduced gradient method.
Other methods for constrained multivariable problems fall into a class referred to as feasible directions, projection methods or methods of restricted movement. Also, there are random search procedures, cutting plane methods and feasible region elimination techniques. The concept associated with each of these procedures will be described and references given for sources of more information. These techniques have founded limited application, and the reasons for this will be described. The restricted movement methods are described in some detail by Avriel(9,11) and others(6,8,15,27). According to Reklaitis et al.(15) even though there are similarities between these projection methods and reduced gradient techniques, the latter are preferred because sparse-matrix methods can be used but the former methods are said to have "sparsity-destroying matrix products." Consequently, the details of these methods are available in the previously cited references, and only an illustration from McMillan(27) will be given to show some of the concepts involved. A simple problem is shown in Figure 6-9 where the starting point is in the feasible region at point xo(8,2). There are three constraints that bound the feasible region, and the unconstrained maximum lies outside of the region. This gradient-projection method begins by a single variable search along the gradient line to locate a maximum. The maximum along the line will be found where a constraint is encountered at point x1(6.6,3.6). The gradient line at point x1 points into the infeasible region. Therefore to continue to move toward the optimum, the gradient line is projected on the constraint, and the search proceeds in this projected-gradient direction along the constraint. The constraint is linear, and a single variable search for the maximum locates point x2(6,4) which is the intersection with another constraint. The gradient line at x2(6,4) points into the infeasible region so it is projected on the constraint, in this case x2 = 4; and the single variable search for the maxima continues. The search arrives at point x3(5.5,4) which is the constrained maximum. In summary, the procedure began with an unconstrained search method, gradient search, until a constraint was encountered. The unconstrained search line was projected on the constraints to be able to stay in the feasible region, and it moved until the maximum was located. Other unconstrained search methods could have been used rather than gradient search, such as the BFGS method. Also, had the constraints been curved, the search method would not follow the constraint; and a hemstitching pattern would have developed as the search method attempted to follow the active nonlinear constraint. This is illustrated in Figure 6-10 and is one of the problems encountered with this method as discussed by Avriel(9). In the cutting plane method(15,28), the nonlinear optimization problem is formulated as follows. Beginning with the nonlinear optimization problem as:
The problem is converted to the following one:
which gives a linear economic model. Then if a starting point is selected that violates only one of the constraints, this constraint can be linearized; and the resulting problem can be solved by linear programming. At the new point the most violated constraint is added, and linearized to find the third point in the search. The procedure continues adding constraints until the optimum is reached and the constraints are satisfied. However, a number of computational difficulties have been encountered with this procedure according to Avriel(9); but it has been attractive because convergence to the global optimum is guaranteed if the economic model and constraints are convex functions. In random search the feasible region is divided into a grid where each nodal point is considered to be the location of a point to compute the value of the economic model, i.e., an experiment. Then if an exhaustive search is performed by calculating the value of the economic model at each point in the grid, these experiments could be ranked from the one with the maximum value of the economic model to the one with the minimum value. However, a specified number of these experiments could be selected randomly and the value of the economic model evaluated at the points. It would be possible to make a statement about the point with the largest value of the economic model being in a certain best fraction of all of the experiments with a specific probability. For example, if there were 1000 nodal points, and if one experiment was placed randomly in these points, the probability of choosing one in the top 10% would be 100/1000 = 0.1. Also, the probability of not chosing one in the top 10% would be 1 - 0.1 = 0.9. (Probability is defined as the relative frequency of occurrences of an event.) If two experiments were placed randomly in the grid on the feasible region, the probability of not finding one in the top 10% would be (0.9)2 = 0.81, and the probability of one of these two being in the top 10% is 1 - 0.81 = 0.19. Continuing, after n trials the formula is:
For n = 16, the probability of finding one of these 16 experiments to be in the best fraction of 0.1 would be p(0.1) = 0.80. For n = 44, p(0.1) = 0.99 which is almost a certainty. The generalization of this procedure, Wilde(10), is given by the following equation.
In this equation p(f) is the probability of finding at least one nodal point in the best fraction, f, having placed n experiments randomly in the feasible region. Several values of n have been computed by Wilde(10) having specified f and p(f). These are given in Table 6-2. Equation (6-72) was used in the following form for these calculations.
Referring to the table, 16 experiments would be required to have at least one in the top 10% with a probability of 0.80 from a total of 1000 experiments. To have at least one value of the economic model in the top 0.5% with a probability of 0.99, 919 experiments of the total of 1000 would have to be measured, i.e., the economic model would have to be evaluated at almost all of the nodal points. Also, it should be noted that the values for n reported in the table have been rounded off, e.g., 919 is 918.72... computed from equation(6-73). If there had been a total of 100 experiments 90 would have been required for at least one in the top 5.0% with a probability of 0.99. The number of nodal points is somewhat independent of the number of variables in the economic model. Also, the results are independent of the number of local maxima or minima. These two facts are considered to be the important advantages of random search. This has led to adaptive random search where a random search is conducted on part of the feasible region. Then another section of the feasible region is selected which contained the largest value of the economic model to repeat the placing of another set of random measurements. This converts random search into a search technique, and it has been called adaptive random search by Gaddy and co-workers(40,41). In their most recent in a series of papers using this technique Martin and Gaddy(41) have described the optimization of a maleic anhydrate process. They showed again that their adaptive randomly directed search method efficiently optimized the types of problems described as large, heavily constrained and often containing mixed integer variables. The feasible region elimination techniques are an extension of the ideas associated with the interval elimination, single variable search methods. These are described in some detail by Wilde and Beightler(12), and two techniques are contour tangent elimination and multivariable dichotomous elimination. The first method is applicable only to functions that are strongly unimodal; and the second procedure requires that functions be rectangularity unimodal, which is more restrictive than strongly unimodal. A strongly unimodal function has a strictly rising path from any point in the feasible region to the optimum. Consequently, a function with a curving ridge would not be strongly unimodal. An example of a strongly unimodal function is given in Figure 6-11. The line from point A to the maximum illustrates a strictly rising path. The multivariable elimination technique is illustrated in Figure 6-11 for two independent variables. First, a starting point, xo, in the feasible region is selected; and a contour tangent line is determined. The area below the contour tangent can be eliminated since it does not contain the optimum; and the procedure continues by placing another experiment in the area that contains the optimum, e.g., point x1. Measuring the contour tangent at x1, an additional region can be eliminated. In this case it will be above the contour tangent line, and the region that contains the maximum is reduced. Again, another measurement is placed in the remaining area that contains the optimum, e.g., x3, and the contour tangent is determined. Eliminating the area to the left of this contour tangent, now the region that contains the optimum has been reduced to the triangular shaped area bounded by the three contour tangents as shown in Figure 6-11. The procedure continues in this fashion until the region that contains the optimum has been reduced to a satisfactory size. The details of the computational procedure are given by Wilde and Beightler(12), and the method has had limited use because of the restrictive requirement of being applicable only to strongly unimodal functions. There are a number of other methods that could have been mentioned, all of which have had some degree of success in optimizing industrial problems, and these are described in the references at the end of this chapter. Many of these methods are modifications and/or combinations of the procedures that have been discussed. In the next section comparisons will be given of the performance of constrained multivariable procedures.
The evaluation of the effectiveness of constrained multivariable optimization procedures depends on several interrelated things. These are the optimization theory, the algorithm or the combination of algorithms to implement the theory, the computer program and programming language used for computations with the algorithms, the computer to run the program and the optimization problems being solved. In comparing constrained optimization procedures, usually the same optimization problems are solved; and comparisons are based on measurements of computer time or the number of times the economic model and constraints are evaluated to come within a certain fraction of the optimum. If different computers are used to solve the optimization problems, then a timing program such as a matrix inversion is run on each machine to give a point of comparison among the computers. Consequently, if there is a superior optimization algorithm, the other factors that affect performance have made it difficult to detect. There is debate about which algorithms and/or computer codes are the better ones, and Lasdon(3) recommended having available several computer codes that incorporate the more successful methods. A judgement about the ones to have can be obtained from the following reviews of industrial experience reporting the use of optimization procedures on process and plant problems. In an Exxon study by Simon and Azma(1) fifteen industrial optimization problems were solved using four established optimization codes, and their results are summarized in Table 6-3. The fifteen problems had from 5 to 250 variables; and there were a number of active constraints at the optimum, ranging for each size problem from 50 to 95% of the number of variables. Two of the four optimization codes ECO and SLP, were developed by Exxon. The ECO program used successive quadratic programming with many of the features described previously for the Wilson, Han, Powell algorithm including a Davidon, Fletcher, Powell update for the Hessian matrix. The SLP program used successive linear programming as described previously with enhancements to speed convergence and circumvent problems with infeasibilities. The GRG2 program used the generalized reduced gradient method, and this program was developed by Lasdon(3). The MINOS program used a projected augmented Lagrangian algorithm combined with the generalized reduced gradient algorithm, and this program was developed by Murtagh and Saunders(43). The problems were run on Exxon's IBM 3033 computer, and the key results extracted from the article are given in Table 6-3. These include the average and range of the number of function calls and the average CPU time used. The optimization applications were complex simulations, and numerical differentiation was required. Consequently, the number of times that the economic model and constraints were evaluated (function calls) was viewed as the primary indicator of performance. CPU times were said to be a guide to performance and were not available for SLP optimizations. Also, two termination criteria were used in the various convergence tests of the programs to have the value of the economic model and constraints be within the tolerance of 0.001 and the economic model be within 0.1% of the optimum. In reviewing the results in Table 6-3, SLP and MINOS solved all of the test problems. It was reported by Simon and Azma(1) that the performance of SLP was better on more tightly constrained problems. Also, they reported that MINOS had impressively fast run times from the use of sparse matrix computational features. The GRG2 code solved all of the five and twenty variable problems and three of the six one hundred variable problems. This code required the greatest number of function calls compared to the others. The ECO code solved all of the five and twenty variable problems, the two one- hundred variable, linearly constrained problems and one of the one- hundred variable, nonlinearly constrained problems. It was reported that nonlinear constraints caused numerical difficulties for the ECO code, because it did not contain special error checking and matrix inversion features of the other codes. This study has shown that to solve large industrial problems these three optimization algorithms must be supplemented with other features associated with numerical differentiation and sparse matrix manipulations. In the following review of other comparison studies of optimization codes, these three procedures are found to be superior to others, and the relative merits of these methods have been tabulated by Lasdon and Warren(22). Successive linear programming (SLP) is said to be easy to implement, widely used in practice, rapidly convergent when the optimum is at a vertex, and is able to handle very large problem. Furthermore, it does not attempt to satisfy equalities at each iteration, may converge slowly on problems with non-vertex optimum and will violate nonlinear constraints until convergence is reached. Successive quadratic programming (SQP) is said to require the fewest function and gradient evaluations of the three methods; it does not attempt to satisfy equalities at each iteration, will violate nonlinear constraints until convergence is reached, is harder to implement than SLP and requires a good quadratic programming solver. Generalized reduced gradient (GRG) is said to be probably the most robust and versatile, being able to employ existing process simulators using the Newton-Raphson method, but is the hardest to implement and needs to satisfy equality constraints at each step of the algorithm. In a dissertation by Sandgren(20) on the utility of nonlinear programming algorithms, 35 optimization algorithms were collected from university and industry sources of which 29 used penalty functions, four used generalized reduced gradient (GRG) and two used successive linear programming (SLP). Thirty test problems were selected from a variety of applications and sources which had from 2 to 48 variables and 4 to 75 constraints. Computations were performed on Purdue University's CDC 6500 computer in double precision, and all gradients were calculated using a forward difference approximation. Solution times were measured, and a rating procedure was used to rank the programs. The results showed that the four codes using the GRG algorithm and one code using SLP solved 50% of the test problems using 25% or less of the computer time averaged for all of the programs. This study established fairly conclusively that GRG and SLP algorithms are superior to penalty function methods. In a study by Schittkowski reported by Reklaitis, et al.(15) 22 optimization programs and 180 test problems were evaluated. The optimization program included 11 SLP, 3 GRG, 4 SQP and 4 penalty function codes; and nine criteria were used and weighted to rank the programs. The ranking of the algorithm classes were in the order of SQP, GRG, SLP and penalty functions last. Also, it was emphasized that these tests showed that code reliability is more a function of the programming of the algorithm than the algorithm itself. In probably the first comprehensive study of nonlinear constrained optimization procedures, Colville(21) organized participants from fifteen industrial firms and universities and had them test eight industrial problems with their thirty optimization codes. He grouped the methods into five categories and developed a scoring procedure. This involved using a timing program for matrix inversion since the results were obtained from a number of different computers. The highest score was received by the GRG method. Himmelblau(8) extended these results and ran some of Colville's and other problems on the same computer. Again, the GRG code was the best performer. However, Palacios-Gomez et al.(47) have shown that their improved version of SLP based on industrial computational experience was comparable to or better than GRG2 and MINOS on Himmelblau's and other test problems. In a recent article, the successive quadratic programming and generalized reduced gradient algorithms have been used with large computer simulations and flowsheeting programs for optimal design. Biegler and Hughes(44,49) showed that successive quadratic programming was effective for optimization of a propylene chlorination process simulation. Also, the previously mentioned study of Jirapongpham, et al.(42) demonstrated that the WHP algorithm was effective for process flowsheeting optimization. Also, Locke and Westerberg(45,46) used an advanced quadratic programming algorithm with an equation-oriented process flowsheeting program with success. In addition, Chen and Stadtherr(48) reported enhancements of the WHP method that were effective on several chemical process optimization problems. Moreover, Biegler and Cuthrell(53) showed the Armijo line search to be one of several improvements to successive quadratic programming. Finally, Drud(58) has developed a GRG program CONOPT which uses the industry standard MPS input format for large static and dynamic problems at the World Bank. In summary, the three methods of choice for optimization of industrial scale problems are successive linear and quadratic programming and the generalized reduced gradient method. The available programs that use these procedures are elaborate and use a combination of techniques for efficient computer computations. Sources of programs using these methods are given by Waren and Lasdon(2), Reklaitis et al.(15) and Gill, et al.(6). Waren and Lasdon(2) list the desirable features of nonlinear programming software which can be used as a guide for selection of codes. It is advisable to have several proven optimization codes available to solve industrial scale problems. Unfortunately, it is not envisioned that one code will be available in the near future which is effective on almost all large problems such as those for linear programming like MPSX. However, progress is being made in that direction, e.g., Drud's work(59), and the current literature should be the first source in looking for optimization programs because of the improvements in computer and optimization algorithms. |