DYNAMIC PROGRAMMING
This optimization procedure was developed at the same organization where Danzig developed linear programming, the RAND Corporation, a U.S. Air Force sponsored "think tank". The research was in response to the need in the early 1950's, the Sputnik era, for a solution to the optimum missile trajectory problem which required extensions to the calculus of variations. Two parallel efforts, one in this country by Richard Bellman and another in Russia by L. S. Pontryagin, led to similar but different solutions to the problem. The name, dynamic programming, was selected by Richard Bellman for this optimization method that he devised and described in a series of papers and the books Dynamic Programming (1) and Applied Dynamic Programming (2). It is thought that the selection of the name bore no direct relation to the method, which was not the situation for linear and geometric programming. There are continuous and discrete versions of this optimization method. The continuous version is used for solutions to the trajectory problem where a continuous function is required, and the discrete version is used when a problem can be described in a series of stages. Most engineering applications use the discrete version of dynamic programming, and it will be the subject of this chapter. One of the first books on the method that elaborated on engineering applications was by Roberts(3) in 1964. It was a comprehensive treatment of the subject at the time and dealt with numerous topics including optimal allocation problems, optimal process control, the relation of the calculus of variations to continuous dynamic programming and stochastic dynamic programming. It is still a valuable reference today. The efforts of Mitten, Nemhauser, Aris and Wilde led to the extension of dynamic programming for systems that involved loops and branches. Aris(4) had published results of research on the application of dynamic programming to the optimal design of chemical reactors, and Mitten and Nemhauser(5) had described a procedure to apply the method to a chemical process that involved a branched system. Professor Wilde(6) of Stanford University was conducting research on dynamic programming at this time and had the opportunity to discuss the method with both Aris and Nemhauser within a short time. This led to a collaboration which produced a landmark paper(7) extending the theory of dynamic programming from serial processes to ones with loops and branches. In a subsequent publication Wilde (8) developed the concept of functional diagrams to represent the functional equations of dynamic programming and a systematic method of converting a process flow diagram to a dynamic programming functional diagram. These results have become the standard way of analyzing processes for dynamic programming optimization and will serve as the foundation for this chapter. Dynamic programming converts a large, complicated optimization problem into a series of interconnected smaller ones, each containing only a few variables. The result is a series of partial optimizations requiring a reduced effort to find the optimum, even though some of the variables may have to be enumerated throughout their range. Also, the previously discussed single and multivariable search methods are applicable to each of the partial optimization steps. Then, the dynamic programming algorithm can be applied to find the optimum of the entire process by using the connected partial optimizations of the smaller problems. As with the other optimization methods, dynamic programming has a unique nomenclature. To introduce this nomenclature, we will begin to discuss the subject with a simple network problem. This will illustrate the concept of stages and partial optimization at a stage by decision variables. Also, it will show the use of state variables to link the stages and serve as the path for the dynamic programming algorithm to complete the optimization of the entire process. This will be followed by a process example where the network is replaced by graphical representations of the economic model (return function) and constraint equations (transition functions) to illustrate the additional complications introduced by continuous functions. Then the dynamic programming algorithm is given and discussed for N-stage serial systems and extended to ones that involve loops and branches. Following this, Wilde's rules are given and illustrated to convert a process flow diagram to a dynamic programming functional diagram. Also, the optimal allocation of resources and the use of time rather than a process unit as a stage are described and illustrated. The latter is the application of dynamic programming to the optimal equipment replacement problem. It is necessary
to begin with the definition of dynamic programming nomenclature. An individual
process unit or a unit of time can be represented as a stage, the black
box in Figure 5-1 of Chapter
5. A stage is shown diagrammatically in Figure
7-1 and represents the economic and process models of the unit. The
economic model is called a return function Ri(si,di)
and gives the measure of profit or cost for the stage. These economic
and process models depend on the independent variables at the stage. These
are decision and state variables. Decision variables, di, are
ones that can be manipulated independently. State variables, si,
are ones that are inputs to the stage from an adjacent stage. Consequently,
they can not be manipulated independently. The stage will have outputs,
There are transition
functions, Stages can be connected,
and a simple serial process is shown in Figure
7-2 with three stages. The diagram illustrates incident identities,
the equations that relate the outputs from one stage to the inputs of
the subsequent stage, e.g., The diagram in Figure 7-2 represents the economic model or the return function, the constraint equations or transition functions, and the incident identities. These functions can be written as:
There are four independent variables, d1,
d2, d3 and s3. These are to be determined
which optimize the sum of the returns R1, R2 and
R3. Also, any bounds specified on With dynamic programming three partial optimizations are performed, one at each stage; and then this information is used to locate the optimum for the entire process. The following equation gives the dynamic programming algorithm for the first stage in terms of maximizing the profit given by the return function. It is necessary to exhaustively list individual values of s1 and to search on d1 to determine f1(s1). This is illustrated in Figure 7-3 where the values of f1(s1) are along the line of maximum values of R1(s1,d1) as determined by the optimal value of d1 for selected values of s1. The values of f1(s1) are tabulated and stored for future use, and equation (7-4) is represented in Figure 7-4 as the functional diagram for stage 1 of the process. At stage 2 the optimal information at stage 1 is used, and the dynamic programming algorithm at this stage is: Again, it is necessary to exhaustively list
individual values of s2 and to search on d2 to obtain
the maximum of the sum of the return at stage 2 and the optimal return
at stage 1, f1(s1). The appropriate values of f1(s1)
are determined using incident identity and transition function s1
= At the third and final stage, the optimal information f2(s2) from stage 2 is used, and the dynamic programming algorithm at this stage is: At this point either the value of s3 is known or it is an independent variable. If s3 is a known constant value, it is necessary only to determine the value of d3 that maximizes f3(s3) in equation (7-6) for that value of s3. An exhaustive listing of values of s3 is not required. If s3 is an independent variable, it is necessary to conduct a two variable search to determine the maximum value of f3(s3,d3). This determines the maximum profit for the system, and the optimal values of the decision variables are extracted from the tabulated partial optimizations from the previous stages. Before giving a simple network example to illustrate the use of dynamic programming, we will give the dynamic programming algorithm in its general form. Bellman (1) devised this method along with a statement for the algorithm in what he called the "Principle of Optimality" which is:
This principle was stated mathematically as the dynamic programming algorithm to maximize a serial process with i stages as: In the algorithm Ri(si,di) is the return from stage i with inputs si and di and output si-1; fi-1(si-1) is the maximum return for stages 1 through i-1 as a function of input si-1 and fi(si) is the maximum return for stages 1 through i as a function of si. A dynamic programming analysis begins with the last section of a system and ends with the first section. The last section of a serial system has an output that does not affect another unit of the system. Therefore, it is convenient to number the stages beginning with the last section and ending with the first section. The following simple network example illustrates the concepts of stages, state and decision variables, and the applications of the dynamic programming algorithm. Example 7-1 A tank truck of an expensive product manufactured in San Francisco is to be delivered to any major port on the East coast for shipment to Europe. The cost for shipment across the Atlantic is essentially the same from the major ports on the East coast. It is desired to select the optimum route (lowest road mileage) from San Francisco to the East coast. The relative distances between cities along possible routes are shown on the network diagram in Figure 7-5. To solve the problem one essentially works backward. Beginning at cities N1, C1 and S1, one places the minimum distance to the East coast in the circle, and this distance, f1, along with the optimal decision is recorded in the table. For example, if the optimal route led to the central city (C1), the optimal decision, d*1, would be to drive to Boston (L-left, not S-straight or R-right) which is closer than New York or Philadelphia. For each value of the state variable, the optimal decision, d*1, is tabulated at stage one according to equation (7-4). At stage two for each state variable (city N2, C2, S2) the minimum distance from that city to the East coast is determined by the dynamic programming algorithm, equation (7-5). The optimal return is the minimum of the sum of the distance from a city in row two (N2, C2 or S2) to those in row one, R2, and the minimum distance from the cities in row one to the East coast f1. The minimum total distance is placed in the circle and recorded in the table at stage 2 along with the optimal decision (L, S or R). This procedure is repeated for the third stage of the process. At the fourth stage the state variable has only one value, San Francisco; and it is only necessary to determine the optimal decision corresponding to San Francisco which is N3. The optimal return (minimum relative distance) is 16, and the optimal policy is shown as the underlined input-output sequence back through thetable. The optimal policy is: start at San Francisco, go left to N3, straight to N2, straight to N1 and then straight to Boston. However, the route is not unique since from N3 a right to C2, left to N1, and straight to Boston is an equally minimal distance route. There is not a unique optimal set of decision variables. In this network example the state variables had three specific values. However, in most problems the state variables are continuous functions, and it is necessary to subdivide the state variables into a number of discrete values. The number of values selected determines the number of times that the optimum value of the decision variable d*1 has to be determined at the stage. The choice of the number of values of the state variable determines the computational effort required at the stage. Also, this choice determines the grid of values on si available when the dynamic programming algorithm is used to determine the optimal decisions, d*1. Interpolation is required to determine the optimal values of the state and decision variables coming back through the table. The following example illustrates these additional complications for a simplified process with continuous transition and return functions given graphically. Example 7-2 A simplified process for the production of phenol employs crude benzoic acid as a feed and is shown in the flow diagram in Figure 7-6. Separation facilities (absorption and distillation) purify the benzoic acid which is sent to a chemical reactor where it is oxidized to phenol. The impure phenol is sent to separation facilities (evaporation and distillation) where pure (99%) phenol is produced. The economic and process models for each of the three steps in the process are shown graphically in Figures 7-7, 7-8 and 7-9. These give the return functions and transition functions for each dynamic programming stage. The contours shown on the figures for the return function are profits if positive and operating costs if negative. To obtain the optimum by dynamic programming, each step in the process is made a dynamic programming stage. The state and decision variables are given on the diagrams shown in Figures 7-7, 7-8 and 7-9. The tables at each stage can be developed using the information in these figures for the dynamic programming optimization. This is illustrated in Table 7-1, and for stage 1 the optimal decision for the maximum profit is to use the largest value of the reflux ratio, d1 = 5. The set of values for the state variable, s1, were selected to be separated by 50 units. This was an arbitrary decision at this point, but it seemed reasonable to permit linear interpolation between the values of the state variables when the dynamic programming algorithm is applied. This information developed for stage 1 is recorded in Figure 7-10 which gives the dynamic programming functional diagram for the process. As shown here each process step has been made a dynamic programming stage. The dynamic programming
algorithm is given in Table 7-1
for the second stage, and it is necessary to exhaustively list values
of the state variable s2 and to determine the optimum value
of the decision variable. Again a spacing of 50 units is selected for
the values of the state variables beginning at the upper limit of 300.
For this value of s2 = 300 a range of values for d2
are listed in Table 7-1, and the corresponding values of Other values of f2(s2) are computed in the same way, and then results are listed in Figure 7-10 at stage 2. It should be noted at this point that an exhaustive search is being done on d2 to determine the optimal value. This is being done for illustrative purposes and is convenient with the process data in a graphical form. However, in an industrial problem the transition and return function could be considerably more complicated, and the search effort to determine the optimal value of the decision variables could be significantly reduced by using one of the previously described single or multivariable search methods. Using the dynamic programming algorithm at stage 3 shown in Table 7-1, we obtain the optimal results for the state and decision variables tabulated in Figure 7-10. The calculation procedure used to locate the optimal value of d3 for s3 = 400 is shown in Table 7-1, and this involved the same procedure used at stage 2. However, industrial problems would have significantly more complicated transition and return functions, and a two variable search would be used to locate the best values of s3 and d3 which maximized f3(s3,d3). In this case the dynamic programming algorithm would be: At this point the maximum profit would seem
to be f3 = 23.0. However, an additional calculation was performed
for s3 = 180 to refine the grid between 150 and 200 to see
if a larger value might be in this range. It turned out that f3(180)
= 21.2 and f3(200) remained the largest value of the profit
for the process. Consequently, the optimum operating conditions can be
determined as shown in Figure
7-10, and it is necessary to interpolate for the values of d1
and Now we are ready to extend our discussion to a system with N dynamic programming stages. This follows in the next section for serial processes, and these results are extended for the cases with loops and branches. Then we will turn to the problem of relating units of an industrial process to dynamic programming stages.
Although it was adequate in the preceding problem, usually the process flow diagram will have to be modified to obtain the dynamic programming functional diagram because information flow rather than material flow must be evaluated. Functional equations and the corresponding functional diagrams of the stages of the system are developed to describe this information flow using the process flow diagram. It will be necessary to broaden the description of a stage to have more than one state and decision variable. Also, when loops and branches are involved, the outputs from one stage may be the inputs to more than one stage. Figure 7-11, shows a dynamic programming stage that has more than one state and one decision variable. The transition and return functions are given in the figure for this stage, also. As shown there are two output state variables which are determined by the two transition functions. These transition functions have six independent variables: three state and three decision variables. The dynamic programming algorithm is given in the figure and shows that a three variable search is required to determine the optimal values of the decision variables. Also, the three state variables have to be exhaustively listed to develop the tabular information for the partial optimization at the stage. It turns out that three is the limit for the number of state variables at a stage because of the computational effort required to obtain optimal decisions for the exhaustive list of values of the state variables. The incident identities
are shown in this figure. These are the equations that give the relations
among the outputs from a stage and the inputs to adjacent stages. Here
A serial system has the output of one stage as the input to the following stage. This is illustrated in Figure 7-12, with one decision per stage for convenience. The functional equations given in Figure 7-12 include the transition functions, incident identities and return function. The incident identities give the relation between the stages. The return function gives a measure of the profit or cost at a stage, and the maximum of the sum of the profits from each stage is to be found by determining the optimal values of the decision variables. Examples 7-1 and 7-2 were serial systems. Serial system optimization
problems are of four types: initial value, final value, two point boundary
value and cyclic problems. In an initial value problem sN is
a known constant; in a final value problem
The dynamic programming algorithm for the i-th stage of the initial value problem is the same as given in Figure 7-12. The optimal return at stage i, fi, is only a function of si, the state variables at stage i. The incident identity and transition function are used to show this result.
Substituting the above equation into equation (7-7) gives: which shows that fi is a function of si, optimizing out di. This algorithm, equation (7-9), applies from stage 2 to stage N-1. At stage 1 the dynamic programming algorithm is: This is the only algorithm that does not contain the optimal return from a preceding stage. At the last stage, stage N, the dynamic programming algorithm is: If the value of sN is a known constant, the maximum return is fN(sN), and an exhaustive tabulation of sN is not required. The problem is referred to as an N-decision, no-state optimization problem. However, if sN is not a constant and can be manipulated like a decision variable to maximize fN, the dynamic programming algorithm at stage N is: This is a no-state, two-decision partial optimization at stage N. Consequently, it is referred to as an (N+1) decision, no-state optimization problem, and there are N+1 independent variables. The set of values for the decision variables, di, and state variables, sN, that maximize the return function is called the optimal policy. N partial optimizations have been required to obtain an optimal return and optimal policy for the system.
For this situation the output from the first stage, 1, is a known constant. There are two approaches to solve this problem which are called state inversion and decision inversion. State inversion
means to transform the final value problem into an initial value problem
by obtaining the N inverse transition functions, i.e., solve the transition
functions for si in terms of
This results in reversing the arrows in Figure 7-12 as shown in Figure 7-13(a). Renumbering the stages makes the problem into an initial value one. The problem has (N-1) one-state, one-decision and one no-state, one-decision partial optimizations. In some cases inverting the transition functions is not possible, and the technique of decision inversion is employed. Here the roles of d1 and s1 are interchanged. The stage one transition function is:
This equation can be put in form
and d1 is uniquely determined
by specifying s1, for The functional equation for the combined stages one and two is now: which can be combined with equation (7-15) and
to obtain the following equation: These manipulations show that combining stages one and two gives one new stage. This new stage requires a one-state, one-decision partial optimization. The final form of the functional diagram for decision inversion is shown in Figure 7-13(b). After decision inversion is performed, the usual serial problem procedure applies to the rest of the stages in the problem. The overall optimization involves N-1 total stages with N-2 one-decision, one-state partial optimizations, and at stage N there is a two-decision, no-state partial optimization. The following example illustrates the effect on state inversion by the number of state variables at a stage. There are three cases to consider as shown in the example. Example 7-3 In state inversion three cases may occur at a stage. These are: the stage has the same number of input state variables as outputs, the stage has more input state variable than outputs or the stage has fewer input state variables than outputs. Assuming the transition functions can be inverted, obtain the transition functions from state inversion for these cases with three state variables and one decision variable per stage. a. State variable inputs are equal to the outputs.
In this case state inversion has the input and output variables interchanged. b. State variable inputs are more than outputs.
In this case there is only one transition function, and for state inversion it is written as:
The state variables s2i and s3i become decision variables along with di. State inversion for this case has a significant advantage for the dynamic programming optimization by converting state variables to decision variables. c. State variable outputs are more than inputs. In this case there are three transition functions, but only one is inverted as shown below.
The remaining two transition functions become
equations that calculate output
This type of problem arises when both the
initial and final values of the state variables The optimization continues in the usual fashion with the dynamic programming algorithm at stage 3 being: At stage N, the dynamic programming algorithm is: This is a no-state, one-decision partial optimization because sN is a constant. To solve the two-point, boundary value problem, first a decision inversion is performed and followed by the partial optimizations for an initial value problem. This involves N-2 one-state, one-decision and one no-state, one-decision partial optimizations. Two-point, boundary value problems always require decision inversion.
The cyclic system is a special case of the
two-point boundary value problem where sN = Then a single variable
search is performed by varying C until the maximum return fN(C)
is located. A Fibonacci or golden section search can be used effectively
on the cut state values of sN =
Example 7-4 In the dynamic programming optimization of the contact process for sulfuric acid discussed later in the chapter, the final functional diagram consisted of five stages with recycle loop with two state variables as shown below: In this problem it is necessary to cut the state of both variables in the recycle loop. The problem is to show that this will have stages 1 and 2 combined with stage 3 to give a three stage cyclic optimization problem. The dynamic programming algorithm and transition functions at stage one are: Cutting the state on the recycle loop gives:
Performing decision inversion using the
transition functions converts
Consequently both s1 and d1
are computed by these transition functions if
There is no partial optimization at stage
one, and the output from stage two is fixed also, for There is no decision at stage two because
the fixed output
The dynamic programming algorithm and transition function at stage 3 is: This is a one-state, one-decision partial optimization and the functional diagram for the combined stages one, two, and three is shown below: The dynamic programming algorithm and transition function for stage 4 give a one-state one-decision partial optimization by the following: The dynamic programming algorithm and transition functions for stage 5 give a no-state, one-decision partial optimization by the following: To locate the best value of s15
= |