MULTI VARIABLE OPTIMIZATION PROCEDURES
There are essentially six types of procedures to solve constrained nonlinear optimization problems. The three considered most successful are successive linear programming, successive quadratic programming and the generalized reduced-gradient method. The other three have not proved as useful, especially on problems with a large number of variables (more than twenty). These are penalty and barrier function methods, augmented Lagrangian functions and the methods of feasible directions (or projections) which are sometimes called methods of restricted movement. Of these methods only successive linear programming does not require an unconstrained single or multivariable search algorithm. Also, penalty and augmented function methods have been used with successive quadratic programming. Each of these methods will be discussed in the order that they were mentioned. This will be followed by a review of studies which have evaluated the performance of the various methods.
This procedure was called the method of approximate programming (MAP) by Griffith and Stewart(18) of Shell Oil Company who originally proposed and tested the procedure on petroleum refinery optimization. As the name implies, the method uses linear programming as a search technique. A starting point is selected, and the nonlinear economic model and constraints are linearized about this point to give a linear problem which can be solved by the Simplex Method or its extensions. The point from the linear programming solution can be used as a new point to linearize the nonlinear problem, and this can be continued until a stopping criterion is met. As shown by Reklaitis et al.(15), this procedure works without safeguards for functions that are mildly nonlinear. However, it is necessary to bound the steps taken in the iterations to insure that the economic model improves, the values of the independent variables remain in the feasible region and the procedure converges to the optimum. These safeguards are bounds on the independent variables specified in advance of solving the linear programming problem. The net result is that the bounds are additional constraint equations. If the bounds are set too small, the procedure will move slowly toward the optimum. If they are set too large, infeasible solutions will be generated. Consequently, logic is incorporated into computer programs to expand the bounds when they hamper rapid progress and shrink them so that the procedure may converge to a stationary point solution(1). For successive linear programming, the general nonlinear optimization problem can be written as:
where upper and lower limits are shown specifically on the independent variables. Now the economic model y(x) and the constraints fi(x) can be linearized around a feasible point xk to give: The problem is in a linear programming format in the form of equation (6-32). However, the values of Dxj can take on either positive or negative values depending on the location of the optimum. Negative values for Dxj are not acceptable with the Simplex Algorithm so a change of variables was made by Griffith and Stewart(18) as follows.
Substituting equation (6-33) into equation (6-32), now the linear programming problem has the form: The bounds on the upper and lower limits on the variables are specified by (uj - xjk) and (lj - xjk). The value of the next point for linearizing is given by xjk+1 = xjk + Dxj+ - Dxj¯, and the procedure is started by specifying a starting point xo(k=0). The above equations are now a linear programming problem where the independent variables are Dxj+ and Dxj¯. The value of the bound uj and lj may affect the rate of convergence of the algorithm. The use of bounds is illustrated in the following example given by Griffith and Stewart(18). Example 6-8(18) Locate the maximum of the following constrained nonlinear optimization problem by the method of succesive linear programming starting at xo(1, 1), and using the bounds (uj - xjk) = (xjk - lj) = 1.
The two constraint equations are shown in Figure 6-4 where they intersect at the maximum of the economic model, point x*(4,3). For this problem the successive linear programming approximation is obtained using equation (6-34).
There are four variables in the above equations Dx1+, Dx2+, Dx1¯, and Dx2¯. Starting at point xo (1,1) the above equations become:
Solving by the Simplex Method gives:
x1 is computed as follows:
and
Linearizing around x1(2,2) gives:
Solving by the Simplex Method gives:
x2 is computed as follows:
and
Linearizing around x2(3,3) gives:
Solving by the Simplex Method gives:
x3 is computed as follows:
and
Linearizing around x3(4,3 1/6) gives:
Solving by the Simplex Method gives:
x is computed as follows:
and
Linearizing around x4(4.0,3.0044) gives:
Solving by the Simplex Method gives:
x5 is computed as follows:
and
This is the optimal solution and is the same as given by Griffith and Stewart (18). It should be noted that point x3(4,3 1/6) is an infeasible point and does not satisfy the first constraint equation. However, this point is sufficiently close to the optimum that the method converges to the optimum after linearizing around this point. Convergence to the optimum will not take place if bounds are not used, however. The problem was solved without the constraints bounding the variables. Starting at point x0(1,1) the point x1(8.5,5.0) was found. Linearizing around x1(8.5,5.0) and solving by the Simplex Method gave the point x2(0,12.23). Then linearizing around x2(0,12.23) gave a set of constraint equations that had an unbounded solution. Consequently, bounds were required on this problem to ensure convergence to a solution. Computer programs will reduce the bounds when an infeasible solution is located, and resolve the problem. This was done for the problem starting at point x2(3,3) since point x3(4,3 1/6) was infeasible, and the bounds were reduced by one-half each time an infeasible point was obtained. Following this procedure, the next two iterations for this problem were (3.563, 3.492) and (3.595, 3.475). Further examination showed the method had difficulty following the first constraint to the optimum. As Himmelblau(8) points out, when constraints become active then successive linear programming's "progress becomes quite slow." Consequently, logic is incorporated in some programs to allow the procedure to continue from an infeasible point, as was done by Griffith and Stewart in this example. For those interested in having a successive linear programming code, Lasdon(19) reports that the most widely used and best known one, POP (Process Optimization Procedure) is available from the SHARE library (COSMIC, Bartow Hall, University of Georgia; Athens GA 30601). Other listings of sources of optimization codes are given by Sandgren(20) and Lasdon and Waren (22). Large linear programming codes have been used with large simulation models in an iterative fashion to approximate the nonlinearities in these models. This approach of using linear programming successively has been successful in large plants. In most cases, this procedure has been used by companies that have many man-years of effort in the development and use of a large linear programming code for plant optimization and a corresponding amount of effort in large simulations of key process units for prediction of performance and yields. An example of this is in petroleum refining where linear programming is used for refinery optimization. In addition, elaborate simulations and correlations have been developed for processes such as catalytic cracking, reforming and distillation. As discussed in Chapter 4, the results of a linear programming optimization are as accurate as the parameters in the economic model and constraint equations, c, A and b. As shown in Figure 6-5 iterative procedures have been developed that use these programs together. The large simulation codes are used to compute the parameters used in the large linear programming code. Then the linear programming code is used to generate an optimal solution in terms of the independent variables, x, which are the process variables required by the simulation codes. This iteration procedure is continued until a stopping criterion is met. Both the linear programming code and the process simulators are very large programs, and no attempt is made to have them run at the same time. Typically, the output from the simulators is edited by a separate program to produce a data set in the form required by the linear programming code. Also, another program can be used to manipulate the output from the linear programming code into a data set for use by the simulation programs. Further descriptions of these procedures are given by Pollack and Lieder(31) for petroleum refinery optimization and by O'Neil, et al.(32) for the allocation of natural gas in large pipeline networks.
Like successive linear programming, a quadratic programming problem is formed from the nonlinear programming problem and is solved iteratively until an optimum is reached. However, the iterative procedure differs from that of successive linear programming. As described by Lasdon and Waren(22), the quadratic programming solution is not accepted immediately as the next point to continue the search, but a single variable search is performed between the old and new points to have a better and feasible point. In quadratic programming the economic model is a quadratic function, and the constraints are all linear equations. To solve this problem the Lagrangian function is formed, and the Kuhn-Tucker conditions are applied to the Lagrangian function (23,24,25) to obtain a set of linear equations. This set of linear equations can then be solved by the Simplex Method for the optimum. It turns out that artificial variables are required for part of the constraints and the slack variables can be used for the other constraints to have an initially feasible basis. Also, finding an initial basic feasible solution may be the only feasible solution(25), so the linear programming computational effort is minimal. At this point it is important to understand the solution of a quadratic programming problem, and this procedure will be described next and illustrated with an example. Then the successive quadratic programming algorithm will be described and illustrated with an example. Also, modifications of the procedure will be discussed that reduce the computational effort in numerically evaluating the Hessian matrix which must be obtained from the nonlinear programming problem. Theoretically, using a quadratic function to approximate the nonlinear economic model of the process can be considered superior to a linear function to represent the economic model. This is part of the motivation for using quadratic programming which can be represented by the following equations: where qjk = qkj would be the second partial derivatives with respect to xj and xk of the nonlinear economic model. They would be computed numerically or analytically from the nonlinear problem given by equation (6-31). Also, cj and aij would be computed as shown by equation (6-32) either numerically or analytically from the nonlinear problem, equation (6-31). The quadratic programming procedure begins by adding slack variables xn+i to the linear constraint equations. It will not be necessary to use xn+i2, since the problem will be solved by linear programming, and all of the variables must be positive or zero. The Lagrangian function is formed as follows: Positive Lagrange multipliers are required, so a negative sign is used on the term with the constraint equations. (See equation 2-49). Setting the first partial derivatives of the Lagrangian function with respect to xj and i equal to zero give the following set of (n+m) linear algebraic equations: The inequality form of the Kuhn - Tucker conditions, equation 6-38, is used to account for xj > 0. (See Hillier and Lieberman(25), and Hadley (59)). Also, the complementary slackness conditions must be satisfied, i.e. product of the slack variables xn+i and the Lagrange multipliers i are zero.
If xn+i = 0, then the constraint is active, an equality; and li ¹ 0. However, if xn+i ¹ 0, then the constraint is inactive, an inequality; and li = 0. For more details refer to the discussion in Chapter 2. The equation set of (6-38) and (6-39) can be converted to a linear programming problem in the following way. Surplus variables are added to equation (6-38) as sj, and slack variables have been added to equation (6-39) as xn+i. The slack variables xn+i can serve as the variables for an initially feasible basis for equations (6-39). However, artificial variables are required to have an initially feasible basis for equation (6-38). Adding artificial variables zj with a coefficient cj to equations (6-38) is a convenient way to start with zj = 1. Also, the objective function will be to minimize the sum of the artificial variables to ensure that they will not be in the final optimal solution. As a result of these modifications, equations (6-38) and (6-39) become: This is now a linear programming problem which can be solved for optimal values of x and l, the solution of the quadratic programming problem. In addition, the solution must satisfy x > 0, l > 0 and lixn+i = 0. Consequently, the Simplex Algorithm has to be modified to avoid having both li and xn+i be basic variables, i.e., nonzero, to satisfy the complimentary slackness conditions(26). This may require choosing the second, best variable to enter the basis in proceeding with the Simplex Algorithm if either li or xn+i are in the basis and the other one is to enter. Franklin(23) has given uniqueness and existence theorems that prove the above procedure is the solution to the quadratic programming problem and is a recommended reference for those details. At this point the method is illustrated with an example. Example 6-9(25) Using quadratic programming determine the maximum of the following function subject to the constraints given.
The quadratic programming problem is constructed using equation (6-41) with c1 = 5, c2 = 1, q11 = 2, q12 = -2, q21 = -2, q22 = 2, a11 = 1, a12 = 1 and b1 = 2. The linear programming problem from equation (6-41) is:
Eliminating z1 and z2 from the objective function gives the following set of equations for the application of the Simplex Method:
this quadratic programming problem is:
The positive Lagrange multiplier is consistent with the Kuhn-Tucker conditions for a maximum, equation (2-49), since a negative sign was used in equation (6-36). Successive quadratic programming iteratively solves a nonlinear programming problem by using a quadratic approximation to the economic model and a linear approximation to the constraint equations. As the series of quadratic programming problems are solved, these intermediate solutions generate a sequence of points that must remain in the feasible region or sufficiently close to this region to converge to the optimum. The logic used with this method is to search along the line between the new and previous point to maintain a feasible or near feasible solution. Also, the computational effort in evaluating the Hessian matrix is significant, and quasi-Newton approximations have been used to reduce this effort. The following example illustrates successive quadratic programming for a simple problem. The discussion that follows describes modifications to the computational procedure to improve the efficiency of the method. Example 6-10 Solve the following problem by successive quadratic programming starting at point x0(0,0).
The contours of the economic model and the constraint equations are shown in Figure 6-6. The nonlinear constraint equation is linearized about the point xk, and it has the following form.
Placing the problem in the form of equation 6-35, gives:
The quadratic programming problem is constructed using equation (6-41) with c1 = 2, c2 = 4, q11 = 2, q12 = q21 = 0, q22 = 2, a11 = (0.208x1k - 0.75), a12 = 1, a21 = 1, a22 = 1, b1 = 0.85 + 0.104x1k, and b2 = 4:
Solving the above linear programming problem by the Simplex Method with xo = (0,0) and ensuring that the complementary slackness conditions are met gives the following result for xo*.
This point is shown on Figure 6-6, and it is outside the feasible region. Consequently, a search along the line between the starting point xo(0,0) and x*(1.192, 1.740) locates feasible point x1(1.039, 1.517) on the first constraint. The quadratic programming problem is formulated around point x1 and solved as was done above. The result is x2(1.209, 1.608) with l1 = 0.784. Repeating the procedure by solving the quadratic programming problem at x2 gives the value of x3(1.199, 1.600) with l1 = 0.800. This is sufficiently close to the optimum of the problem x*(1.2, 1.6) for the purposes of this illustration to say that a converged solution has been obtained. The Wilson-Han-Powell method is an enhancement to successive quadratic programming where the Hessian matrix, [qjk] of equation (6-35), is replaced by a quasi-Newton update formula such as the BFGS algorithm, equation (6-20). Consequently, only first partial derivative information is required, and this is obtained from finite difference approximations of the Lagrangian function, equation (6-36). Also, an exact penalty function is used with the line search to adjust the step from one feasible point to the next feasible point. The theoretical basis for this algorithm is that it has a superlinear convergence rate if an exact penalty function is used with the DFP or BFGS update for the Hessian matrix of the Lagrangian function, and global convergence is obtained to a Kuhn-Tucker point when minimizing an economic model which is bounded below and has convex functions for constraint equations. The details and proofs are given by Han (51,52). The problem in example 6-10 was solved with the Wilson-Han- Powell algorithm. The identity matrix was used for the Hessian matrix at the starting point xo(0,0). The subsequent steps in the solution were x1 (1.3803, 1.6871), x2 (1.203, 1.6038), and x3 (1.1988, 1.603), which was sufficiently close to the optimum to stop. Generally, less computational effort is required with the Wilson-Han-Powell algorithm since second order partial derivatives do not have to be evaluated. The Exxon quadratic programming code(1) uses the Wilson-Han-Powell algorithm described above, and they have added refinements to minimize the computational effort in evaluating the second partial derivatives of the Hessian matrix. This typical large quadratic programming code is described as having the following steps of basic logic. An initial starting point is selected, and the linearized constraints are constructed numerically. Then the matrix of second partial derivatives, the Hessian matrix, is evaluated either numerically or a DFP (Davidon, Fletcher, Powell) approximation can be used. The quadratic programming problem is solved generating a new optimal point. Using this new point and the old point, a single variable search is conducted for an improved, feasible solution to the nonlinear problem. This is followed by termination tests using the Kuhn-Tucker conditions, changes in step and function values and feasibility checks. Some options included in the program include using analytical derivatives when furnished, inputting the Hessian matrix by the user or having it be a specified multiple of the identity matrix with up-dating by the DFP algorithm, and having the user specify whether or not intermediate solutions are required to be feasible. In closing this section, it should be mentioned that the Wilson- Han-Powell (WHP) method has been used successfully on computer-aided process design problems, as described by Jirapongphan, et al.(42), Vanderplaats(24) and Biegler and Cuthrell(53). In some applications, the constraint equations are not converged for each step taken by the optimization algorithm, but an infeasible trajectory is followed where the constraints are not satisfied until the optimum is reached. In the line search to adjust the step from one point to the next, an exact penalty function is used. A step length parameter is employed with the penalty function to force convergence from poor starting conditions. The size of the quadratic programming problem is reduced by substituting the linearized equality constraint equations into the quadratic economic model leaving only the inequalities as constraints. The result can be a significant reduction in the number of the independent variables for highly constrained problems. This technique has been included in comparisons of constrained nonlinear programming methods which are described in a subsequent section. It has been shown to be one of the three better procedures. Now, the equally successful procedure called the generalized reduced gradient method is described.
This procedure is one of a class of techniques called reduced-gradient or gradient projection methods which are based on extending methods for linear constraints to apply to nonlinear constraints(6). They adjust the variables so the active constraints continue to be satisfied as the procedure moves from one point to another. The ideas for these algorithms were devised by Wilde and Beightler(12) using the name of constrained derivatives, by Wolfe(29) using the name of the reduced-gradient method and extended by Abadie and Carpenter(30) using the name generalized reduced gradient. According to Avriel(9) if the economic model and constraints are linear this procedure is the Simplex Method of linear programming, and if no constraints are present it is gradient search. The development of the procedure begins with the nonlinear optimization problem written with equality constraints. The necessary slack and surplus variables have been added as xs or xs2 to any inequality constraints, and the problem is:
Again, there are m constraint equations and n independent variables with n>m. Also, although not specifically written above, the variables can have upper and lower limits; and the procedure as described here will ensure that all variables are positive or zero. The idea of generalized reduced gradient is to convert the constrained problem into an unconstrained one by using direct substitution. If direct substitution were possible it would reduce the number of independent variables to (n-m) and eliminate the constraint equations. However, with nonlinear constraint equations, it is not feasible to solve the m constraint equations for m of the independent variables in terms of the remaining (n-m) variables and then to substitute to these equations into the economic model. Therefore, the procedures of constrained variation and Lagrange multipliers in the classical theory of maxima and minima are required. There, the economic model and constraint equations were expanded in a Taylor series, and only the first order terms were retained. Then with these linear equations, the constraint equations could be used to reduce the number of independent variables. This led to the Jacobian determinants of the method of constrained variation and the definition of the Lagrange multiplier being a ratio of partial derivatives as was shown in Chapter 2. The development of the generalized reduced gradient method follows that of constrained variation. The case of two independent variables and one constraint equation will be used to demonstrate the concept, and then the general case will be described. Consider the following problem.
Expanding the above in a Taylor series about a feasible point xk(x1k,x2k) gives: Substituting equation (6-44b) into equation (6-44a) to eliminate x2 gives, after some rearrangement: In equation (6-45) the first two terms are known constants being evaluated at point xk, and the coefficient of (x1 - x1k) is also a known constant and gives the x1 the direction to move. Thus, to compute the stationary point for this equation, dy/dx1 = 0; and the result is the same as for constrained variation, equation (2-18), which is the term in the brackets of equation (6-45) that is solved together with the constraint equation for the stationary point. However, the term in the bracket also can be viewed as giving the direction to move away from xk to obtain improved values of the economic model and satisfy the constraint equation. The generalized reduced gradient method uses the same approach as described for two independent variables, which is to find an improved direction for the economic model and also to satisfy the constraint equations. This leads to an expression for the reduced gradient from equation (6-42). To develop this method the independent variables are separated into basic and nonbasic ones. There are m basic variables xb, and (n-m) nonbasic variables xnb. In theory the m constraint equations could be solved for the m basic variables in terms of the (n-m) nonbasic variables, i.e.:
Indicating the solution of xb in terms of xnb from equation (6-46) gives:
The names basic and nonbasic variables are from linear programming. In linear programming the basic variables are all positive, and the nonbasic variables are all zero. However in nonlinear programming, the nonbasic variables are used to compute the values of the basic variables and are manipulated to obtain the optimum of the economic model. The economic model can be thought of as a function of the nonbasic variables only, that is if the constraint equations (6-47) are used to eliminate the basic variables, i.e.:
Expanding the above equation in a Taylor series about xk and including only the first order terms gives: In matrix notation equation (6-49) can be written as:
This equation is comparable to equation (6-44a). A Taylor series expansion of the constraint equations (6-46) gives an equation that can be substituted into equation (6-50) to eliminate the basic variables. or in matrix form equation (6-51) is:
The following equation defines Bb as the matrix of the first partial derivatives of fi associated with the basic variables and Bnb as the matrix associated with the nonbasic variables, i.e.:
This is a convenient form of equation (6-52) which can be used to eliminate dxb from equation (6-50). Solving equation (6-53) for dxb gives:
Substituting equation (6-54) into equation (6-50) gives:
Eliminating dxnb from equation (6-55), the equation for the reduced gradient Y(xk) is obtained.
Knowing the values of the first partial derivatives of the economic model and constraint equations at a feasible point, the reduced gradient can be computed by equation (6-57). This will satisfy the economic model and the constraint equations. The generalized reduced gradient method uses the reduced gradient to locate better values of the economic model in the same way unconstrained gradient search was used, i.e.:
where is the parameter of the line along the reduced gradient. A line search on is used to locate the optimum of Y(xnb) along the reduced gradient line from xk. In taking trial steps as a is varied along the generalized reduced gradient line, the matrices Bb and Bnb must be evaluated along with the gradients Ñyb(xb) and Ñynb(xk). This requires knowing both xnb and xb at each step. The values of xnb are obtained from equation (6-58). However, equation (6-46) must be solved for xb; and frequently, this must be done numerically using the Newton - Raphson method. As pointed out by Reklaitis et al.(15) most of the computational effort can be involved in using the Newton - Raphson method to evaluate feasible values of the basic variables once the nonbasic variables have been computed from equation (6-58). The Newton - Raphson algorithm in terms of the nomenclature for this procedure is given by the following equation.
where the values of the roots of the constraint equations (6-46) are being sought for xb, having computed xnb from equation |