25-05-2012, 05:21 PM
Multi-Stage Decision Process
(DP). Based on the Principle of Optimality, this was originally concerned with optimal decisions over time. For continuous time, it addresses problems in variational calculus. For discrete time, each period is sometimes called a stage, and the DP is called a multi-stage decision process. Here is the Fundamental Recurrence Equation for an additive process:
F(t, s) = Opt{r(t, s, x) + aF(t', s'): x in X(t, s) and s'=T(t, s, x)},
where F(t, s) = optimal total return upon reaching time t in state s, and x = decision variable(s); s, s' = state variables; t, t' = time; X(t, s) = decision space (usually depends only on state); r(t, s, x) = immediate return; T(t, s, x) = state transform.
In words, the optimal return upon reaching time t in state s equals the optimal return from an immediate decision plus the optimal return for the remainder of the process upon the transition to state s'. This new state is a function of the current time, state, and decision. For discrete time, t'=t+1 for the forward equation, and t'=t-1 for the backward equation. The multiplier (a) is generally a positive scalar, such as a discount factor to yield the present value of money. (For some problems a=1, in which case the infinite horizon model might be ill posed.) More generally, the process need not be additive, so that '+' could be another composition operation.
A decision rule is a function, x*(t, s), that specifies an optimal value of x upon reaching time t in state s. For a completely deterministic process, backtracking is used to obtain the usual form of an optimal solution from a known initial or final state. The process could be stochastic, in which case r and F are expected values, and the state transform is random with probability distribution, P[T(t,s,x) = s' | s,x]. Thus, F(t',s') is replaced by Sum_s'{F(t',s')P[T(t,s,x)=s'|s,x]}.
DP is a technique that applies to static problems too. For example, consider the following recurrence equation for the knapsack problem:
F(n, s) = Max{c(n)x + F(n-1, s-a(n)x): x in {0, 1, ..., |_s/a(n)_|}}.
In words, this says that the max total return with n objects having total knapsack space s equals the max choice (x) of how much of the n-th object we put into the knapsack, with immediate return c(n), plus the max total return from the remaining n-1 objects with the remaining space (s - a(n)x). The value of x (the stage decision variable) is any non-negative integer up to the space allowed: |_s/a(n)_| is the max number of objects of this type that could fit in the knapsack. There could also be explicit bounds on x, such as the 0-1 knapsack problem, in which case the policy set, X(n,s) simply has only those values.
Variational calculus. An approach to solving a class of optimization problems that seek a functional (y) to make some integral function (J) an extreme. Given F is in C^1, the classical unconstrained problem is to find y in C^1 to minimize (or maximize) the following function:
_x1
|
J(y) = | F(x,y,y') dx.
|
_|x0
An example is a min arc length, where F = sqrt(1+y'^2). Using the Euler-Lagrange equation, the solution is y(x) = ax + b, where a and b are determined by boundary conditions: y(x0) = y0 and y(x1) = y1.
If constraints take the form G(x, y, y') = 0, this is called the problem of Lagrange; other forms are possible.
Euler-Lagrange equation. These are necessary conditions for y to solve the classical problem in variational calculus:
_x
|
dF/dy' = | (dF/dy) dx + c.
|
_|x0
Principle of optimality. The idea that an optimal path, or trajectory, is composed of optimal sub-paths. Equivalently, once we reach a particular state in a sequential decision process, the remaining decisions must be optimal with respect to that state. This is the foundation of dynamic programming.