Skip to content

SDDP Algorithm

This spec describes the Stochastic Dual Dynamic Programming (SDDP) algorithm as implemented in Cobre: the multistage stochastic formulation, the iterative forward/backward pass structure, convergence monitoring, policy graph topologies, state variable requirements, and the single-cut formulation. It serves as the algorithmic foundation referenced by all other mathematical specs.

For notation conventions (index sets, parameters, decision variables, dual variables), see Notation Conventions.

Cobre solves the hydrothermal dispatch problem: determining optimal generation schedules for hydro and thermal plants over a multi-year planning horizon under inflow uncertainty. Key characteristics:

  • Flexible for short or long horizons: 1 month - 5 years (daily, weekly or monthly stages)
  • Large state space: 160+ hydro reservoirs with AR inflow models \approx 2000 state dimensions
  • Stochastic inflows: PAR(p) autoregressive models with seasonal patterns
  • Customize modeling complexity stagewise: Scenario generation, hydro production and others are configurable stagewise and elementwise

2. Multistage Stochastic Programming Formulation

Section titled “2. Multistage Stochastic Programming Formulation”

The hydrothermal dispatch problem is formulated as a multistage stochastic program:

minx1,,xTE[t=1Tct(ωt)xt(ωt)]\min_{x_1, \ldots, x_T} \mathbb{E}\left[ \sum_{t=1}^{T} c_t(\omega_t)^\top x_t(\omega_{t}) \right]

subject to stage-linking constraints and uncertainty realization. The nested formulation uses value functions:

Vt(xt1)=Eωt[minxt{ctxt+Vt+1(xt):Atxt=btEtxt1,  xtXt}]V_t(x_{t-1}) = \mathbb{E}_{\omega_t}\left[ \min_{x_t} \left\{ c_t^\top x_t + V_{t+1}(x_t) : A_t x_t = b_t - E_t x_{t-1}, \; x_t \in \mathcal{X}_t \right\} \right]

with terminal condition VT+1(x)=0V_{T+1}(x) = 0.

Key insight: The value function Vt(x)V_t(x) is convex and piecewise linear (for LP subproblems), enabling outer approximation via Benders cuts.

Value function approximation via Benders cuts — each cut is a tangent at a trial point (its slope is the analytic derivative Q(v)Q'(v)), tightening the outer approximation toward the true convex cost-to-go function.

SDDP iteratively builds piecewise-linear approximations V^tk\hat{V}_t^k at iteration kk of the true value functions through:

  1. Forward pass: Sample scenarios, make decisions using current approximation
  2. Backward pass: Compute cuts to improve the approximation
  3. Convergence check: Evaluate stopping criteria
flowchart TB
    FWD["<b>1. Forward Pass</b><br/><br/>Sample M scenario trajectories, stages 1…T<br/>Solve stage LPs under current cuts<br/>Record trial points, stage costs → UB<br/><i>MPI: allgatherv trial points + costs</i>"]
    BWD["<b>2. Backward Pass</b><br/><br/>For stages T → 1:<br/>Evaluate all N openings at each trial point<br/>Extract duals → aggregate 1 Benders cut<br/>Add cut to previous stage's LP<br/><i>MPI: allgatherv cuts per stage</i>"]
    CHK["<b>3. Stopping Rule Check</b><br/><br/>LB = ρ₀[Q₀(x₀, ω)]<br/>gap = (UB − LB) / max(1, |UB|)<br/>Check: gap ≤ tol · iter limit · stall"]
    CONV{"converged?"}
    DONE(["stopped"])

    FWD -->|trial points| BWD
    BWD -->|new cuts added| CHK
    CHK --> CONV
    CONV -->|no · k ← k+1| FWD
    CONV -->|yes| DONE

The forward pass simulates the system under the current policy to generate trial points — the visited states that will be used by the backward pass to construct cuts.

For each of MM independent scenario trajectories:

  1. Start from the known initial state x0x_0 (initial storage volumes and inflow history)
  2. At each stage t=1,,Tt = 1, \ldots, T, sample a scenario realization ωt\omega_t from the stage’s scenario set, then solve the stage LP using the incoming state x^t1\hat{x}_{t-1} and the sampled scenario. The LP includes all current Benders cuts as constraints on the future cost variable θ\theta
  3. Record the optimal state x^t\hat{x}_t (end-of-stage storage volumes and updated AR lags) as the trial point for stage tt
  4. Pass x^t\hat{x}_t as the incoming state to stage t+1t+1

The forward pass produces: (a) trial points {x^t}\{\hat{x}_t\} at each stage for each trajectory, and (b) stage costs for upper bound estimation.

Parallelization: Forward trajectories are independent — Cobre distributes MM trajectories across MPI ranks, with threads solving individual stage LPs within each rank.

Warm-starting: The forward pass LP solution at stage tt provides a near-optimal basis for the backward pass solves at the same stage, significantly reducing solve times.

The backward pass improves the value function approximation by generating new Benders cuts, walking stages in reverse order from TT down to 11.

At each stage tt, for each trial point x^t1\hat{x}_{t-1} collected during the forward pass:

  1. Solve the stage tt LP for every scenario ωΩt\omega \in \Omega_t (branching), using the trial state x^t1\hat{x}_{t-1} as incoming state
  2. From each LP solution, extract the optimal objective value Qt(x^t1,ω)Q_t(\hat{x}_{t-1}, \omega) and the reduced costs of the pinned incoming-state columns (incoming storage and AR lags). These reduced costs give the cut coefficients directly — no combination with FPHA or generic constraint duals is needed. See Cut Management
  3. Compute per-scenario cut coefficients (α(ω),π(ω))(\alpha(\omega), \pi(\omega)) from the duals and trial point
  4. Aggregate into a single cut via probability-weighted expectation — see Cut Management
  5. Add the aggregated cut to stage t1t-1‘s cut pool

The backward pass produces one new cut per stage per trial point per iteration.

Scenario tree — the forward pass samples M sparse paths through the branching tree, while the backward pass evaluates all N openings at each trial point and aggregates them into one Benders cut. The sparse-forward / exhaustive-backward asymmetry is why SDDP converges where full-tree evaluation is infeasible.

Forward pass — sparse sampling (M = 2 paths)Backward pass — exhaustive evaluation (N = 3 openings)x₀ roott=2t=2t=2t=3t=3t=3t=3t=3t=3t=3t=3t=3trial point x̂ξ₁ → solve LP → dual π₁ξ₂ → solve LP → dual π₂ξ₃ → solve LP → dual π₃Benders cut: π̄ = E[π], ᾱ = E[α] path A path Bsampledsampled

Lower Bound: The risk-adjusted lower bound is computed by solving the stage-0 LP for every opening in the stage-0 scenario set, collecting the per-opening objectives, and aggregating them through the stage-0 risk measure:

zk=ρ0[Q0k(x0,ω)    ωΩ0]\underline{z}^k = \rho_0 \Big[ Q_0^k(x_0, \omega) \;\Big|\; \omega \in \Omega_0 \Big]

where Q0k(x0,ω)Q_0^k(x_0, \omega) is the optimal objective of the stage-0 LP under opening ω\omega with the current cut approximation at iteration kk, x0x_0 is the known initial state, and ρ0\rho_0 is the stage-0 risk measure (expected value under risk-neutral, or CVaR under risk-averse). The per-opening probabilities are uniform: p(ω)=1/Ω0p(\omega) = 1 / |\Omega_0|.

Under risk-neutral settings (ρ0=E\rho_0 = \mathbb{E}), this reduces to a simple average over all stage-0 opening objectives. Because cuts are added monotonically and never removed within a training run, this bound is non-decreasing across iterations — see Cut Management for the append-only guarantee.

Upper Bound: Estimated via Monte Carlo simulation over the forward pass trajectories (see Upper Bound Evaluation):

zˉk=1Mm=1Mt=1Tctxt(m)\bar{z}^k = \frac{1}{M} \sum_{m=1}^{M} \sum_{t=1}^{T} c_t^\top x_t^{(m)}

Optimality Gap:

gapk=zˉkzkmax(1,zˉk)\text{gap}^k = \frac{\bar{z}^k - \underline{z}^k}{\max(1, |\bar{z}^k|)}

3.4 Execution Model and Performance Considerations

Section titled “3.4 Execution Model and Performance Considerations”

The SDDP iteration structure has specific properties that guide the parallelization strategy and solver lifecycle design. These are summarized here as architectural constraints.

Thread-trajectory affinity: The dominant parallelization strategy assigns each thread ownership of a complete forward trajectory. With NN threads (summed across all ranks), NN forward passes execute in parallel, each thread solving its trajectory’s stage LPs sequentially from t=1t = 1 to TT. The same thread that executed forward pass kk also performs the backward pass for the scenarios sampled by forward pass kk. This affinity pattern preserves cache locality (solver basis, scenario data, LP coefficients remain warm in the thread’s cache lines) and simplifies implementation by eliminating cross-thread data handoff.

Backward pass synchronization: The backward pass enforces a per-stage synchronization barrier: cut construction at stage tt completes before cuts at stage t1t-1 are computed. This serializes the algorithm at stage boundaries and is the dominant scaling factor for parallel speedup. Within a stage, each thread’s branching scenarios are solved sequentially, reusing the warm solver basis saved from the forward pass at that stage.

Forward pass state saving: When the number of forward passes MM exceeds the number of available threads NN, threads must process multiple trajectories in batches. This requires efficiently saving and restoring forward pass state (solver basis, scenario realization, visited states) at stage boundaries — analogous to CPU context switching for threads, but simpler because suspension only occurs at well-defined stage boundaries, not at arbitrary points.

LP rebuild cost: Memory constraints prevent keeping all stage LPs with their full cut sets resident simultaneously. The solver must rebuild LPs and add cut constraints when transitioning between stages, which lies on the critical performance path. The design minimizes this rebuild cost through strategies such as cut preallocation, basis persistence, and incremental constraint updates.

Reduced-cost cut extraction: Each state variable (incoming storage and inflow lag) is pinned by column bounds, and the reduced cost of that pinned column gives the cut coefficient directly — no preprocessing or dual combination is needed. FPHA hyperplane and generic constraint effects are captured automatically by the LP solver through the pinned column’s reduced cost. See Cut Management.

For implementation, see the cobre developer-guide.

The standard SDDP formulation uses an acyclic directed graph:

  • Nodes: Stages t{1,,T}t \in \{1, \ldots, T\}
  • Arcs: Transitions with probabilities (typically deterministic: p=1p = 1)
  • Terminal: VT+1(x)=0V_{T+1}(x) = 0 (no future cost beyond stage TT)

For long-term planning, Cobre supports infinite periodic horizon with cyclic graphs:

  • Cycle: Stage TT transitions back to stage 11 (or a cycle start)
  • Discount: Cycle transitions require a discount factor d<1d < 1 for convergence
  • Cut sharing: Cuts at equivalent cycle positions are shared

See Discount Rate for the discounted Bellman equation and Horizon Modes for the complete cyclic formulation.

5. State Variables and the Markov Property

Section titled “5. State Variables and the Markov Property”

For SDDP to generate valid cuts, the subproblem must satisfy the Markov property: future costs depend only on the current state, not on the path by which the current state was reached.

State variables in Cobre:

ComponentVariableCountDescription
Hydro storagevhv_hNhydroN_{hydro}Reservoir volume at end of stage
AR inflow lagsah,a_{h,\ell}hPh\sum_h P_hLagged inflows for AR(P) models

The PAR(p) inflow model requires past inflows ah,t1,ah,t2,a_{h,t-1}, a_{h,t-2}, \ldots to compute current inflow. To maintain the Markov property, these lags are included as state variables pinned by column bounds to the corresponding incoming state value:

ah,=aˉh,=a^h,hH,  {1,,Ph}\underline{a}_{h,\ell} = \bar{a}_{h,\ell} = \hat{a}_{h,\ell} \quad \forall h \in \mathcal{H}, \; \ell \in \{1, \ldots, P_h\}

where a^h,\hat{a}_{h,\ell} is the lag \ell inflow value passed from the previous stage. The reduced costs of these pinned columns (πh,lag\pi^{lag}_{h,\ell}) contribute to cut coefficients, capturing the marginal value of inflow history — see Cut Management.

See PAR Inflow Model for the complete autoregressive formulation and LP Formulation for how these constraints appear in the stage LP.

One aggregated cut per iteration:

θt1αˉ+πˉxt1\theta_{t-1} \geq \bar{\alpha} + \bar{\pi}^\top x_{t-1}

where αˉ=E[α(ω)]\bar{\alpha} = \mathbb{E}[\alpha(\omega)] and πˉ=E[π(ω)]\bar{\pi} = \mathbb{E}[\pi(\omega)].

  • Pros: Fewer cuts, smaller LP, faster solves
  • Cons: May require more iterations to converge

An alternative multi-cut formulation creates one cut per scenario per iteration, with per-scenario future cost variables θω\theta_\omega. This formulation yields a tighter approximation per iteration at the cost of a larger LP. The trade-off analysis and convergence comparison are in Cut Management.

A terminal boundary cut is a Benders cut on the terminal future-cost function that originates from an external Cobre run rather than from the current run’s backward pass. Terminal boundary cuts couple a downstream run’s policy with an upstream run’s value function at the time-horizon boundary.

What they represent: In a chained study, the upstream run builds a policy over a longer or earlier horizon. The future-cost function approximation at the upstream run’s terminal stage — a set of Benders cuts — captures the expected cost of continuing from any state at that stage boundary. When the downstream run imports these cuts, they serve as the initial approximation of the cost-to-go at the downstream run’s terminal stage TT.

The methodology guarantee: Imported terminal cuts are valid Benders cuts — they satisfy the same lower-bound and convexity conditions as any cut generated internally. Adding them to the initial cut set at stage TT is equivalent to seeding the outer approximation with information from the upstream run. The backward pass then generates additional cuts as usual, tightening the approximation from that informed starting point. The append-only property of the cut pool ensures the lower-bound estimate remains non-decreasing throughout training, whether cuts were imported or generated internally.

The use case: Chaining Cobre runs for studies that span different horizons or temporal resolutions — for example, a long-horizon strategic study feeding a shorter-horizon detailed study — without re-running the upstream model. The downstream run starts with an already-informed terminal cost, which typically reduces the number of iterations needed for the lower bound to reach the convergence threshold.

Trade-off: The imported cuts reflect the upstream run’s stochastic model and state discretization. If the downstream run uses a different inflow model, a different set of reservoirs, or a different penalty structure, the imported cuts may be inconsistent with the downstream subproblems. Inconsistencies do not cause infeasibility (the backward pass will generate corrective cuts) but they may slow convergence or produce a looser initial approximation than a consistent import would provide.

For the full treatment of chained studies and horizon coupling, see Part 5 — Coupling and Boundary Conditions.

  • Notation Conventions — All index sets, parameters, decision variables, and dual variable definitions
  • LP Formulation — Complete stage subproblem LP that the forward/backward passes solve
  • Cut Management — Cut generation, aggregation, selection, validity conditions, and the append-only monotonicity guarantee
  • PAR Inflow Model — Stochastic inflow model driving uncertainty in the forward pass
  • Discount Rate — Discounted Bellman equation, stage-dependent rates, discount factor on θ
  • Horizon Modes — Finite vs cyclic policy graphs; cyclic-mode formal structure (season function, cycle convergence inequality, season-indexed cut pool, fixed-point Bellman operator)
  • Upper Bound Evaluation — Upper bound estimation methods
  • Stopping Rules — Convergence criteria that terminate the iterative process
  • Risk Measures — CVaR and risk-averse extensions to the Bellman recursion
  • Penalty System — Recourse slacks guaranteeing feasibility (relatively complete recourse)
  • Equipment Formulations — Thermal equipment modelling
  • Scenario Generation — Fixed opening tree, sampling scheme abstraction, complete tree mode