Skip to content

The SDDP Framework in One Page

This chapter gives a reader new to stochastic dynamic programming the minimum context needed to follow the rest of the book. It covers the SDDP idea in one page. Every claim here is developed in detail in SDDP Algorithm.

Optimal dispatch over a multi-stage horizon cannot be solved by enumerating all scenarios: the number of scenario paths grows exponentially with the number of stages. SDDP avoids enumeration by exploiting the structure of the problem.

At each stage, the future is summarised by a cost-to-go function VtV_t — the minimum expected cost from that stage forward, as a function of the incoming state (reservoir levels, inflow lags). The dispatch problem obeys a stagewise (Bellman) recursion: the cost-to-go at stage tt is the expected cost of the best decision now plus the cost-to-go of the state that decision leaves behind:

Vt(xt1)=Eωt ⁣[minxt  ctxt+Vt+1(xt)]V_t(x_{t-1}) = \mathbb{E}_{\omega_t}\!\left[\, \min_{x_t}\; c_t^\top x_t + V_{t+1}(x_t) \,\right]

with terminal condition VT+1(x)=0V_{T+1}(x) = 0. The minimisation runs over the feasible stage-tt decisions xtx_t given the incoming state xt1x_{t-1} and the realized uncertainty ωt\omega_t. For LP subproblems VtV_t is convex and piecewise linear — the property SDDP exploits.

SDDP never forms Vt+1V_{t+1} explicitly. It approximates it from below with a growing collection of affine functions called Benders cuts, each a valid underestimator everywhere:

Vt+1(x)    αi+πixfor every cut iV_{t+1}(x) \;\ge\; \alpha_i + \pi_i^\top x \qquad \text{for every cut } i

In the stage problem the future cost is carried by an epigraph variable θ\theta bounded by every cut (θαi+πix\theta \ge \alpha_i + \pi_i^\top x), so the approximation is the upper envelope maxi(αi+πix)\max_i (\alpha_i + \pi_i^\top x).

The approximation improves iteratively. Starting with no cuts, the algorithm alternates two passes:

Forward pass: Starting from the initial state, simulate a batch of scenario paths forward through all stages, recording the states visited at each stage.

Backward pass: Starting from the last stage and working back to stage one, solve each subproblem at each visited state and use the resulting dual information to generate a new cut for the previous stage. The cut encodes how the cost-to-go changes as the incoming state changes.

flowchart LR
  A([forward pass]) --> B[sample scenarios]
  B --> C{converged?}
  C -- no --> D[backward: add cuts]
  D --> A
  C -- yes --> E([policy])

One SDDP iteration: the forward pass simulates trajectories under the current policy; the backward pass adds a Benders cut at each visited state; the loop repeats until the bounds meet.

As iterations accumulate, the piecewise-linear approximation covers more of the state space and the lower bound rises toward the true optimum:

The convex cost-to-go V(x)V(x) approximated from below by Benders cuts. Each cut is a tangent at a trial state — its slope is the analytic derivative — and the upper envelope of the cuts tightens toward the true function as cuts are added.

SDDP maintains two bounds on the optimal expected cost:

Lower bound z\underline{z}: The objective value of the stage-one subproblem with the current cut approximation. This value is a rigorous lower bound on the optimal cost; it increases monotonically as cuts are added.

Upper bound z\overline{z}: The sample-average cost of forward simulations under the current policy. Because this average is computed from a finite number of scenarios, it carries statistical uncertainty; Cobre reports both the estimate and its confidence interval.

The optimality gap is the difference between the two:

gap=zz\text{gap} = \overline{z} - \underline{z}

When the gap falls below the stopping criterion, the algorithm terminates. The stopping rule governs how the gap is assessed; see Stopping Rules for the full treatment.

The monotone lower bound and the Monte-Carlo upper bound (with its 95% confidence band) close the gap as iterations proceed; the run stops when the gap is statistically negligible.

Convergence of SDDP to the optimal value function, under the cut selection strategies used in Cobre, is guaranteed with probability 1 as iterations tend to infinity. In practice, production studies converge in tens to hundreds of iterations.

The basic SDDP algorithm minimises expected cost. Cobre supports risk-averse formulations through nested risk measures applied at each stage, based on the conditional value-at-risk (CVaR), which shifts weight toward high-cost scenarios:

CVaRα(Z)=minηR{η+1αE[(Zη)+]}\text{CVaR}_\alpha(Z) = \min_{\eta \in \mathbb{R}} \left\{ \eta + \frac{1}{\alpha} \mathbb{E}\left[(Z - \eta)^+\right] \right\}

where α(0,1]\alpha \in (0, 1] is the confidence level — CVaRα\text{CVaR}_\alpha is the expected cost in the worst α\alpha-fraction of scenarios. Cobre uses a convex combination of expectation and CVaR,

ρλ,α[Z]=(1λ)E[Z]+λCVaRα[Z]\rho^{\lambda, \alpha}[Z] = (1 - \lambda)\, \mathbb{E}[Z] + \lambda \cdot \text{CVaR}_\alpha[Z]

with risk-aversion weight λ[0,1]\lambda \in [0, 1] (λ=0\lambda = 0 recovers the risk-neutral mean, λ=1\lambda = 1 is pure CVaR). The measure is applied inside the backward pass: cut coefficients are aggregated using risk-weighted averages rather than simple expectations. See Risk Measures for the formulation.

Cost distribution with the mean, VaRα\mathrm{VaR}_\alpha and CVaRα\mathrm{CVaR}_\alpha marked and the worst (1α)(1-\alpha) tail shaded; raising λ\lambda moves the policy from the mean toward the tail.

This overview omits every implementation detail: the LP column layout, cut storage, the scenario tree structure, the warm-start strategy, and the distributed-execution design. Those details are the subject of Parts 2 through 7 of this book.

For the full algorithm, with stage LP formulation, cut coefficient derivation, and convergence analysis, see SDDP Algorithm.

For the cut mechanics — how cuts are generated, stored, and pruned — see Cut Management.

  • SDDP Algorithm — complete algorithmic treatment: stage LP formulation, cut generation, convergence proof
  • Cut Management — cut storage, selection strategies, and domination pruning
  • Risk Measures — CVaR and nested risk measure formulations for risk-averse SDDP
  • Stopping Rules — gap-based and iteration-based convergence criteria