Chapter 7

DYNAMIC PROGRAMMING

Introduction top

This optimization procedure was developed at the same organization where Danzig developed linear programming, the RAND Corporation, a U.S. Air Force sponsored "think tank". The research was in response to the need in the early 1950's, the Sputnik era, for a solution to the optimum missile trajectory problem which required extensions to the calculus of variations. Two parallel efforts, one in this country by Richard Bellman and another in Russia by L. S. Pontryagin, led to similar but different solutions to the problem.

     The name, dynamic programming, was selected by Richard Bellman for this optimization method that he devised and described in a series of papers and the books Dynamic Programming (1) and Applied Dynamic Programming (2). It is thought that the selection of the name bore no direct relation to the method, which was not the situation for linear and geometric programming.

     There are continuous and discrete versions of this optimization method. The continuous version is used for solutions to the trajectory problem where a continuous function is required, and the discrete version is used when a problem can be described in a series of stages. Most engineering applications use the discrete version of dynamic programming, and it will be the subject of this chapter.

     One of the first books on the method that elaborated on engineering applications was by Roberts(3) in 1964. It was a comprehensive treatment of the subject at the time and dealt with numerous topics including optimal allocation problems, optimal process control, the relation of the calculus of variations to continuous dynamic programming and stochastic dynamic programming. It is still a valuable reference today.

     The efforts of Mitten, Nemhauser, Aris and Wilde led to the extension of dynamic programming for systems that involved loops and branches. Aris(4) had published results of research on the application of dynamic programming to the optimal design of chemical reactors, and Mitten and Nemhauser(5) had described a procedure to apply the method to a chemical process that involved a branched system. Professor Wilde(6) of Stanford University was conducting research on dynamic programming at this time and had the opportunity to discuss the method with both Aris and Nemhauser within a short time. This led to a collaboration which produced a landmark paper(7) extending the theory of dynamic programming from serial processes to ones with loops and branches. In a subsequent publication Wilde (8) developed the concept of functional diagrams to represent the functional equations of dynamic programming and a systematic method of converting a process flow diagram to a dynamic programming functional diagram. These results have become the standard way of analyzing processes for dynamic programming optimization and will serve as the foundation for this chapter.

     Dynamic programming converts a large, complicated optimization problem into a series of interconnected smaller ones, each containing only a few variables. The result is a series of partial optimizations requiring a reduced effort to find the optimum, even though some of the variables may have to be enumerated throughout their range. Also, the previously discussed single and multivariable search methods are applicable to each of the partial optimization steps. Then, the dynamic programming algorithm can be applied to find the optimum of the entire process by using the connected partial optimizations of the smaller problems.

     As with the other optimization methods, dynamic programming has a unique nomenclature. To introduce this nomenclature, we will begin to discuss the subject with a simple network problem. This will illustrate the concept of stages and partial optimization at a stage by decision variables. Also, it will show the use of state variables to link the stages and serve as the path for the dynamic programming algorithm to complete the optimization of the entire process. This will be followed by a process example where the network is replaced by graphical representations of the economic model (return function) and constraint equations (transition functions) to illustrate the additional complications introduced by continuous functions. Then the dynamic programming algorithm is given and discussed for N-stage serial systems and extended to ones that involve loops and branches. Following this, Wilde's rules are given and illustrated to convert a process flow diagram to a dynamic programming functional diagram. Also, the optimal allocation of resources and the use of time rather than a process unit as a stage are described and illustrated. The latter is the application of dynamic programming to the optimal equipment replacement problem.

     It is necessary to begin with the definition of dynamic programming nomenclature. An individual process unit or a unit of time can be represented as a stage, the black box in Figure 5-1 of Chapter 5. A stage is shown diagrammatically in Figure 7-1 and represents the economic and process models of the unit. The economic model is called a return function Ri(si,di) and gives the measure of profit or cost for the stage. These economic and process models depend on the independent variables at the stage. These are decision and state variables. Decision variables, di, are ones that can be manipulated independently. State variables, si, are ones that are inputs to the stage from an adjacent stage. Consequently, they can not be manipulated independently. The stage will have outputs, wpe4.jpg (700 bytes)i, that are inputs to adjacent stages. Input state variables to a stage could represent the flow rate of feed from an upstream unit, and output state variables could represent products from the stage that go to a downstream unit for further refining.

     There are transition functions, wpe5.jpg (700 bytes)i = Ti(si,di), at each stage; and these equations could represent material and energy balances at the stage, i.e., the conversion of raw materials to products. Recall in linear programming that transition functions were represented by the volumetric yields. The stage shown in Figure 7-1 represents the transition functions, also.

     Stages can be connected, and a simple serial process is shown in Figure 7-2 with three stages. The diagram illustrates incident identities, the equations that relate the outputs from one stage to the inputs of the subsequent stage, e.g., wpe6.jpg (700 bytes)2 = s1. Also, it is standard procedure in dynamic programming to number stages from right to left rather than left to right . The reason for this is that we usually start at the end of the process and work back to the start in the direction opposite to model flow.

     The diagram in Figure 7-2 represents the economic model or the return function, the constraint equations or transition functions, and the incident identities. These functions can be written as:

Optimize:   R1(s1,d1) + R2(s2,d2) + R3(s3,d3)       (7-1)

Subject to: wpe7.jpg (700 bytes)1 = T1(s1,d1)                                  (7-2a)

              wpe8.jpg (700 bytes)2 = T2(s2,d2)                                  (7-2b)

              wpe9.jpg (700 bytes)3 = T3(s3,d3)                                  (7-2c)

              wpeA.jpg (700 bytes)2 = s1, wpeB.jpg (700 bytes)3 = s2                                (7-3)

There are four independent variables, d1, d2, d3 and s3. These are to be determined which optimize the sum of the returns R1, R2 and R3. Also, any bounds specified on wpeC.jpg (700 bytes)3 = s2, wpeD.jpg (700 bytes)2 = s1 and wpeD.jpg (700 bytes)1 would have to be satisfied.

     With dynamic programming three partial optimizations are performed, one at each stage; and then this information is used to locate the optimum for the entire process. The following equation gives the dynamic programming algorithm for the first stage in terms of maximizing the profit given by the return function.

 Equ7-4.jpg (3657 bytes)

It is necessary to exhaustively list individual values of s1 and to search on d1 to determine f1(s1). This is illustrated in Figure 7-3 where the values of f1(s1) are along the line of maximum values of R1(s1,d1) as determined by the optimal value of d1 for selected values of s1. The values of f1(s1) are tabulated and stored for future use, and equation (7-4) is represented in Figure 7-4 as the functional diagram for stage 1 of the process.

     At stage 2 the optimal information at stage 1 is used, and the dynamic programming algorithm at this stage is:

 Equ7-5.jpg (4788 bytes)

Again, it is necessary to exhaustively list individual values of s2 and to search on d2 to obtain the maximum of the sum of the return at stage 2 and the optimal return at stage 1, f1(s1). The appropriate values of f1(s1) are determined using incident identity and transition function s1 = wpeD.jpg (700 bytes)2 = T2(s2,d2). Thus, the optimal values of f2(s2) can be determined and stored for future use.

     At the third and final stage, the optimal information f2(s2) from stage 2 is used, and the dynamic programming algorithm at this stage is:

Equ7-6.jpg (4810 bytes)

At this point either the value of s3 is known or it is an independent variable. If s3 is a known constant value, it is necessary only to determine the value of d3 that maximizes f3(s3) in equation (7-6) for that value of s3. An exhaustive listing of values of s3 is not required. If s3 is an independent variable, it is necessary to conduct a two variable search to determine the maximum value of f3(s3,d3). This determines the maximum profit for the system, and the optimal values of the decision variables are extracted from the tabulated partial optimizations from the previous stages.

     Before giving a simple network example to illustrate the use of dynamic programming, we will give the dynamic programming algorithm in its general form. Bellman (1) devised this method along with a statement for the algorithm in what he called the "Principle of Optimality" which is:

"An optimal policy has the property that whatever the initial state and initial decision, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision."

This principle was stated mathematically as the dynamic programming algorithm to maximize a serial process with i stages as:

Equ7-7.jpg (4871 bytes)

In the algorithm Ri(si,di) is the return from stage i with inputs si and di and output si-1; fi-1(si-1) is the maximum return for stages 1 through i-1 as a function of input si-1 and fi(si) is the maximum return for stages 1 through i as a function of si.

     A dynamic programming analysis begins with the last section of a system and ends with the first section. The last section of a serial system has an output that does not affect another unit of the system. Therefore, it is convenient to number the stages beginning with the last section and ending with the first section. The following simple network example illustrates the concepts of stages, state and decision variables, and the applications of the dynamic programming algorithm.

Example 7-1

A tank truck of an expensive product manufactured in San Francisco is to be delivered to any major port on the East coast for shipment to Europe. The cost for shipment across the Atlantic is essentially the same from the major ports on the East coast. It is desired to select the optimum route (lowest road mileage) from San Francisco to the East coast. The relative distances between cities along possible routes are shown on the network diagram in Figure 7-5.

     To solve the problem one essentially works backward. Beginning at cities N1, C1 and S1, one places the minimum distance to the East coast in the circle, and this distance, f1, along with the optimal decision is recorded in the table. For example, if the optimal route led to the central city (C1), the optimal decision, d*1, would be to drive to Boston (L-left, not S-straight or R-right) which is closer than New York or Philadelphia. For each value of the state variable, the optimal decision, d*1, is tabulated at stage one according to equation (7-4). At stage two for each state variable (city N2, C2, S2) the minimum distance from that city to the East coast is determined by the dynamic programming algorithm, equation (7-5).

     The optimal return is the minimum of the sum of the distance from a city in row two (N2, C2 or S2) to those in row one, R2, and the minimum distance from the cities in row one to the East coast f1. The minimum total distance is placed in the circle and recorded in the table at stage 2 along with the optimal decision (L, S or R). This procedure is repeated for the third stage of the process. At the fourth stage the state variable has only one value, San Francisco; and it is only necessary to determine the optimal decision corresponding to San Francisco which is N3. The optimal return (minimum relative distance) is 16, and the optimal policy is shown as the underlined input-output sequence back through thetable. The optimal policy is: start at San Francisco, go left to N3, straight to N2, straight to N1 and then straight to Boston. However, the route is not unique since from N3 a right to C2, left to N1, and straight to Boston is an equally minimal distance route. There is not a unique optimal set of decision variables.

     In this network example the state variables had three specific values. However, in most problems the state variables are continuous functions, and it is necessary to subdivide the state variables into a number of discrete values. The number of values selected determines the number of times that the optimum value of the decision variable d*1 has to be determined at the stage. The choice of the number of values of the state variable determines the computational effort required at the stage. Also, this choice determines the grid of values on si available when the dynamic programming algorithm is used to determine the optimal decisions, d*1. Interpolation is required to determine the optimal values of the state and decision variables coming back through the table. The following example illustrates these additional complications for a simplified process with continuous transition and return functions given graphically.

Example 7-2

A simplified process for the production of phenol employs crude benzoic acid as a feed and is shown in the flow diagram in Figure 7-6. Separation facilities (absorption and distillation) purify the benzoic acid which is sent to a chemical reactor where it is oxidized to phenol. The impure phenol is sent to separation facilities (evaporation and distillation) where pure (99%) phenol is produced. The economic and process models for each of the three steps in the process are shown graphically in Figures 7-7, 7-8 and 7-9. These give the return functions and transition functions for each dynamic programming stage. The contours shown on the figures for the return function are profits if positive and operating costs if negative.

     To obtain the optimum by dynamic programming, each step in the process is made a dynamic programming stage. The state and decision variables are given on the diagrams shown in Figures 7-7, 7-8 and 7-9. The tables at each stage can be developed using the information in these figures for the dynamic programming optimization. This is illustrated in Table 7-1, and for stage 1 the optimal decision for the maximum profit is to use the largest value of the reflux ratio, d1 = 5. The set of values for the state variable, s1, were selected to be separated by 50 units. This was an arbitrary decision at this point, but it seemed reasonable to permit linear interpolation between the values of the state variables when the dynamic programming algorithm is applied.

     This information developed for stage 1 is recorded in Figure 7-10 which gives the dynamic programming functional diagram for the process. As shown here each process step has been made a dynamic programming stage.

     The dynamic programming algorithm is given in Table 7-1 for the second stage, and it is necessary to exhaustively list values of the state variable s2 and to determine the optimum value of the decision variable. Again a spacing of 50 units is selected for the values of the state variables beginning at the upper limit of 300. For this value of s2 = 300 a range of values for d2 are listed in Table 7-1, and the corresponding values of wpeD.jpg (700 bytes)2 = s1, R2 and f1 are determined from Figures 7-8 and 7-10. Computing (R2 + f1) the optimal value of the decision d2 is determined that gives the largest value of (R2 + f1). This is shown in Table 7-1 as d2 = 470 and (R2 + f1) = 18, and this determines f2(300) = 18 by the application of the dynamic programming algorithm.

     Other values of f2(s2) are computed in the same way, and then results are listed in Figure 7-10 at stage 2. It should be noted at this point that an exhaustive search is being done on d2 to determine the optimal value. This is being done for illustrative purposes and is convenient with the process data in a graphical form. However, in an industrial problem the transition and return function could be considerably more complicated, and the search effort to determine the optimal value of the decision variables could be significantly reduced by using one of the previously described single or multivariable search methods.

     Using the dynamic programming algorithm at stage 3 shown in Table 7-1, we obtain the optimal results for the state and decision variables tabulated in Figure 7-10. The calculation procedure used to locate the optimal value of d3 for s3 = 400 is shown in Table 7-1, and this involved the same procedure used at stage 2. However, industrial problems would have significantly more complicated transition and return functions, and a two variable search would be used to locate the best values of s3 and d3 which maximized f3(s3,d3). In this case the dynamic programming algorithm would be:

 Ex7-2.jpg (4614 bytes)

At this point the maximum profit would seem to be f3 = 23.0. However, an additional calculation was performed for s3 = 180 to refine the grid between 150 and 200 to see if a larger value might be in this range. It turned out that f3(180) = 21.2 and f3(200) remained the largest value of the profit for the process. Consequently, the optimum operating conditions can be determined as shown in Figure 7-10, and it is necessary to interpolate for the values of d1 and wpeD.jpg (700 bytes)1 at stage 1. However, the grid of values for s1 is satisfactory to permit linear interpolation for d1 and wpeD.jpg (700 bytes)1 within the accuracy known for those variables from the economic and process models.

     Now we are ready to extend our discussion to a system with N dynamic programming stages. This follows in the next section for serial processes, and these results are extended for the cases with loops and branches. Then we will turn to the problem of relating units of an industrial process to dynamic programming stages.

Variables, Transforms and Stages top

Although it was adequate in the preceding problem, usually the process flow diagram will have to be modified to obtain the dynamic programming functional diagram because information flow rather than material flow must be evaluated. Functional equations and the corresponding functional diagrams of the stages of the system are developed to describe this information flow using the process flow diagram. It will be necessary to broaden the description of a stage to have more than one state and decision variable. Also, when loops and branches are involved, the outputs from one stage may be the inputs to more than one stage.

     Figure 7-11, shows a dynamic programming stage that has more than one state and one decision variable. The transition and return functions are given in the figure for this stage, also. As shown there are two output state variables which are determined by the two transition functions. These transition functions have six independent variables: three state and three decision variables. The dynamic programming algorithm is given in the figure and shows that a three variable search is required to determine the optimal values of the decision variables. Also, the three state variables have to be exhaustively listed to develop the tabular information for the partial optimization at the stage. It turns out that three is the limit for the number of state variables at a stage because of the computational effort required to obtain optimal decisions for the exhaustive list of values of the state variables.

     The incident identities are shown in this figure. These are the equations that give the relations among the outputs from a stage and the inputs to adjacent stages. Here wpeD.jpg (700 bytes)1i and wpeD.jpg (700 bytes)2i could both be inputs to stage i-1, or they could be inputs to two stages which would be the case for a diverging branch. The discussion follows for the serial optimization problem where the outputs from one stage are inputs to the following stage. Then this is expanded to include loops and branches.

Serial System Optimization (7) top

A serial system has the output of one stage as the input to the following stage. This is illustrated in Figure 7-12, with one decision per stage for convenience. The functional equations given in Figure 7-12 include the transition functions, incident identities and return function. The incident identities give the relation between the stages. The return function gives a measure of the profit or cost at a stage, and the maximum of the sum of the profits from each stage is to be found by determining the optimal values of the decision variables. Examples 7-1 and 7-2 were serial systems.

     Serial system optimization problems are of four types: initial value, final value, two point boundary value and cyclic problems. In an initial value problem sN is a known constant; in a final value problem wpeD.jpg (700 bytes)1 is a known constant; and in a two point, boundary value problem both sN and wpeD.jpg (700 bytes)1 are known. In a cyclic problem sN = wpeD.jpg (700 bytes)1 and the best value has to be determined that maximizes fN(sN).

Initial Value Problem top

The dynamic programming algorithm for the i-th stage of the initial value problem is the same as given in Figure 7-12.

Equ7-7.jpg (4871 bytes)

     The optimal return at stage i, fi, is only a function of si, the state variables at stage i. The incident identity and transition function are used to show this result.

si-1 = wpeD.jpg (700 bytes)i = Ti (si,di)                        (7-8)

Substituting the above equation into equation (7-7) gives:

Equ7-9.jpg (5202 bytes)

which shows that fi is a function of si, optimizing out di.

     This algorithm, equation (7-9), applies from stage 2 to stage N-1. At stage 1 the dynamic programming algorithm is:

Equ7-10.jpg (3921 bytes)

This is the only algorithm that does not contain the optimal return from a preceding stage.

     At the last stage, stage N, the dynamic programming algorithm is:

Equ7-11.jpg (5734 bytes)

If the value of sN is a known constant, the maximum return is fN(sN), and an exhaustive tabulation of sN is not required. The problem is referred to as an N-decision, no-state optimization problem. However, if sN is not a constant and can be manipulated like a decision variable to maximize fN, the dynamic programming algorithm at stage N is:

Equ7-12.jpg (5204 bytes)

This is a no-state, two-decision partial optimization at stage N. Consequently, it is referred to as an (N+1) decision, no-state optimization problem, and there are N+1 independent variables. The set of values for the decision variables, di, and state variables, sN, that maximize the return function is called the optimal policy. N partial optimizations have been required to obtain an optimal return and optimal policy for the system.

Final Value Problem top

For this situation the output from the first stage, 1, is a known constant. There are two approaches to solve this problem which are called state inversion and decision inversion.

     State inversion means to transform the final value problem into an initial value problem by obtaining the N inverse transition functions, i.e., solve the transition functions for si in terms of wpeD.jpg (700 bytes)i as indicated below.

si = T1.jpg (802 bytes)1 (i,di)         for i = 1,2,...,N                (7-13)

     This results in reversing the arrows in Figure 7-12 as shown in Figure 7-13(a). Renumbering the stages makes the problem into an initial value one. The problem has (N-1) one-state, one-decision and one no-state, one-decision partial optimizations.

     In some cases inverting the transition functions is not possible, and the technique of decision inversion is employed. Here the roles of d1 and s1 are interchanged. The stage one transition function is:

wpeD.jpg (700 bytes)1 = T1.jpg (802 bytes)1(s1,d1) = constant                            (7-14)

This equation can be put in form

d1 = T1.jpg (802 bytes)1(s1, wpeD.jpg (700 bytes)1)                                           (7-15)

and d1 is uniquely determined by specifying s1, for wpeD.jpg (700 bytes)1 is a constant for this case. Stage one is decisionless and is combined with stage two. This is shown diagramatically in Figure 7-13(b) with the arrow reversed on d1 indicating that it is no longer a decision and the arrow crossed on 1 indicating that it is a constant.

     The functional equation for the combined stages one and two is now:

 Equ7-16.jpg (4941 bytes)

which can be combined with equation (7-15) and

s1 = wpeD.jpg (700 bytes)2 = T2(s2,d2)                                                                    (7-17)

to obtain the following equation:

Equ7-18.jpg (11669 bytes)

These manipulations show that combining stages one and two gives one new stage. This new stage requires a one-state, one-decision partial optimization. The final form of the functional diagram for decision inversion is shown in Figure 7-13(b).

     After decision inversion is performed, the usual serial problem procedure applies to the rest of the stages in the problem. The overall optimization involves N-1 total stages with N-2 one-decision, one-state partial optimizations, and at stage N there is a two-decision, no-state partial optimization.

     The following example illustrates the effect on state inversion by the number of state variables at a stage. There are three cases to consider as shown in the example.

Example 7-3

In state inversion three cases may occur at a stage. These are: the stage has the same number of input state variables as outputs, the stage has more input state variable than outputs or the stage has fewer input state variables than outputs. Assuming the transition functions can be inverted, obtain the transition functions from state inversion for these cases with three state variables and one decision variable per stage.

a. State variable inputs are equal to the outputs.

Pg322a.jpg (16200 bytes).

In this case state inversion has the input and output variables interchanged.

b. State variable inputs are more than outputs.

  Pg322b.jpg (14711 bytes).

In this case there is only one transition function, and for state inversion it is written as:

s1i = T2.jpg (855 bytes)i(S.jpg (721 bytes)i, s2i, s3i, di)

The state variables s2i and s3i become decision variables along with di. State inversion for this case has a significant advantage for the dynamic programming optimization by converting state variables to decision variables.

c. State variable outputs are more than inputs.

    Pg322c.jpg (13241 bytes)

In this case there are three transition functions, but only one is inverted as shown below.

wpeD.jpg (700 bytes)1i = T1i(si,di)          si   = T2.jpg (855 bytes)1i(wpeD.jpg (700 bytes)1i,di)

wpeD.jpg (700 bytes)2i = T2i(si,di)          wpeD.jpg (700 bytes)2i = T2.jpg (855 bytes)2i(wpeD.jpg (700 bytes)1i,di)

wpeD.jpg (700 bytes)3i = T3i(si,di)          wpeD.jpg (700 bytes)3i = T2.jpg (855 bytes)3i(wpeD.jpg (700 bytes)1i,di)

The remaining two transition functions become equations that calculate output wpeD.jpg (700 bytes)2i and wpeD.jpg (700 bytes)3i from values of wpeD.jpg (700 bytes)1i and di.

Two-Point Boundary Value Problem top

This type of problem arises when both the initial and final values of the state variables wpeD.jpg (700 bytes)1 and sN are specified. The problem requires decision inversion because state inversion still would give a two-point, boundary value problem. Decision inversion is performed condensing stages one and two as was shown in Figure 7-13(b). Then the partial optimization proceeds as in an initial value problem. The dynamic programming algorithm for the combined stage one and two is the same as equation (7-18). This is a one-state, one-decision optimization at stage 1-2, because wpeD.jpg (700 bytes)1 is a specified constant.

     The optimization continues in the usual fashion with the dynamic programming algorithm at stage 3 being:

Equ7-19.jpg (5445 bytes)

At stage N, the dynamic programming algorithm is:

 

Equ7-20.jpg (5155 bytes)

This is a no-state, one-decision partial optimization because sN is a constant.

     To solve the two-point, boundary value problem, first a decision inversion is performed and followed by the partial optimizations for an initial value problem. This involves N-2 one-state, one-decision and one no-state, one-decision partial optimizations. Two-point, boundary value problems always require decision inversion.

Cyclic System Optimization top

The cyclic system is a special case of the two-point boundary value problem where sN = wpeD.jpg (700 bytes)1, and the functional diagram is shown in Figure 7-14. The method to solve this problem is to select a value of wpeD.jpg (700 bytes)1 = sN = C and proceed to determine the optimum return as a two-point boundary value problem. The dynamic programming algorithm at stage N is:

Equ7-21.jpg (5027 bytes)

     Then a single variable search is performed by varying C until the maximum return fN(C) is located. A Fibonacci or golden section search can be used effectively on the cut state values of sN = wpeD.jpg (700 bytes)1 = C to locate the best value that maximizes fN(C). Fixing the value of a state variable is referred to as cutting the state and is indicated on a functional diagram by two slashes on the arrow of the state variable. The following example illustrates the procedure for cyclic optimization.

 

Example 7-4

In the dynamic programming optimization of the contact process for sulfuric acid discussed later in the chapter, the final functional diagram consisted of five stages with recycle loop with two state variables as shown below:

In this problem it is necessary to cut the state of both variables in the recycle loop. The problem is to show that this will have stages 1 and 2 combined with stage 3 to give a three stage cyclic optimization problem.

     The dynamic programming algorithm and transition functions at stage one are:

Ex7-4a.jpg (5902 bytes)

Cutting the state on the recycle loop gives:

wpeD.jpg (700 bytes)11 = s15 = constant

wpeD.jpg (700 bytes)21 = s25 = constant

Performing decision inversion using the transition functions converts wpeD.jpg (700 bytes)11 and wpeD.jpg (700 bytes)21, from fixed outputs to inputs. Solving the transition function for s1 and d1 in terms of wpeD.jpg (700 bytes)11 and wpeD.jpg (700 bytes)21 gives:

s1 = T2.jpg (855 bytes)11(wpeD.jpg (700 bytes)11,wpeD.jpg (700 bytes)21)                    d1 = T2.jpg (855 bytes)21(wpeD.jpg (700 bytes)11,v21)

Consequently both s1 and d1 are computed by these transition functions if wpeD.jpg (700 bytes)11 and wpeD.jpg (700 bytes)21 are specified. This means that f1(s1) is specified also.

f1(s1) = R1(wpeD.jpg (700 bytes)11,wpeD.jpg (700 bytes)21)

There is no partial optimization at stage one, and the output from stage two is fixed also, for wpeD.jpg (700 bytes)2 = s1. The dynamic programming algorithm and transition function at stage 2 are:

Ex7-4b.jpg (5937 bytes)

There is no decision at stage two because the fixed output wpeD.jpg (700 bytes)2 will convert decision d2 into a computed output by decision inversion, i.e., d2 = T2.jpg (855 bytes)2(wpeD.jpg (700 bytes)2,s2) where wpeD.jpg (700 bytes)2 = s1 = T2.jpg (855 bytes)11(wpeD.jpg (700 bytes)11,wpeD.jpg (700 bytes)21).   There is no partial optimization at stage two as shown by the following equation for R2(s2,d2) = R2(s2,wpeD.jpg (700 bytes)11,wpeD.jpg (700 bytes)21):

f2(s2) = R2(s2,wpeD.jpg (700 bytes)11,wpeD.jpg (700 bytes)21) + R1(wpeD.jpg (700 bytes)11,wpeD.jpg (700 bytes)21)

The dynamic programming algorithm and transition function at stage 3 is:

Ex7-4c.jpg (5877 bytes)

This is a one-state, one-decision partial optimization and the functional diagram for the combined stages one, two, and three is shown below:

Pg326.jpg (14878 bytes)

The dynamic programming algorithm and transition function for stage 4 give a one-state one-decision partial optimization by the following:

Ex7-4d.jpg (5832 bytes)

The dynamic programming algorithm and transition functions for stage 5 give a no-state, one-decision partial optimization by the following:

Ex7-4e.jpg (6876 bytes)

To locate the best value of s15 = wpeD.jpg (700 bytes)11 and s25 = wpeD.jpg (700 bytes)21 a two variable search can be performed which maximizes f5(s15,s25). However, as we will see subsequently, this is a borderline problem for dynamic programming. One should consider optimizing the problem using a multivariable search on the five decision variables directly rather than having to perform the decision inversion and two variable search.

top