SDDP Algorithm
Purpose
Section titled “Purpose”This spec describes the Stochastic Dual Dynamic Programming (SDDP) algorithm as implemented in Cobre: the multistage stochastic formulation, the iterative forward/backward pass structure, convergence monitoring, policy graph topologies, state variable requirements, and the single-cut formulation. It serves as the algorithmic foundation referenced by all other mathematical specs.
For notation conventions (index sets, parameters, decision variables, dual variables), see Notation Conventions.
1. Problem Context
Section titled “1. Problem Context”Cobre solves the hydrothermal dispatch problem: determining optimal generation schedules for hydro and thermal plants over a multi-year planning horizon under inflow uncertainty. Key characteristics:
- Flexible for short or long horizons: 1 month - 5 years (daily, weekly or monthly stages)
- Large state space: 160+ hydro reservoirs with AR inflow models 2000 state dimensions
- Stochastic inflows: PAR(p) autoregressive models with seasonal patterns
- Customize modeling complexity stagewise: Scenario generation, hydro production and others are configurable stagewise and elementwise
2. Multistage Stochastic Programming Formulation
Section titled “2. Multistage Stochastic Programming Formulation”The hydrothermal dispatch problem is formulated as a multistage stochastic program:
subject to stage-linking constraints and uncertainty realization. The nested formulation uses value functions:
with terminal condition .
Key insight: The value function is convex and piecewise linear (for LP subproblems), enabling outer approximation via Benders cuts.
Value function approximation via Benders cuts — each cut is a tangent at a trial point (its slope is the analytic derivative ), tightening the outer approximation toward the true convex cost-to-go function.
3. The SDDP Algorithm
Section titled “3. The SDDP Algorithm”SDDP iteratively builds piecewise-linear approximations at iteration of the true value functions through:
- Forward pass: Sample scenarios, make decisions using current approximation
- Backward pass: Compute cuts to improve the approximation
- Convergence check: Evaluate stopping criteria
flowchart TB
FWD["<b>1. Forward Pass</b><br/><br/>Sample M scenario trajectories, stages 1…T<br/>Solve stage LPs under current cuts<br/>Record trial points, stage costs → UB<br/><i>MPI: allgatherv trial points + costs</i>"]
BWD["<b>2. Backward Pass</b><br/><br/>For stages T → 1:<br/>Evaluate all N openings at each trial point<br/>Extract duals → aggregate 1 Benders cut<br/>Add cut to previous stage's LP<br/><i>MPI: allgatherv cuts per stage</i>"]
CHK["<b>3. Stopping Rule Check</b><br/><br/>LB = ρ₀[Q₀(x₀, ω)]<br/>gap = (UB − LB) / max(1, |UB|)<br/>Check: gap ≤ tol · iter limit · stall"]
CONV{"converged?"}
DONE(["stopped"])
FWD -->|trial points| BWD
BWD -->|new cuts added| CHK
CHK --> CONV
CONV -->|no · k ← k+1| FWD
CONV -->|yes| DONE
3.1 Forward Pass
Section titled “3.1 Forward Pass”The forward pass simulates the system under the current policy to generate trial points — the visited states that will be used by the backward pass to construct cuts.
For each of independent scenario trajectories:
- Start from the known initial state (initial storage volumes and inflow history)
- At each stage , sample a scenario realization from the stage’s scenario set, then solve the stage LP using the incoming state and the sampled scenario. The LP includes all current Benders cuts as constraints on the future cost variable
- Record the optimal state (end-of-stage storage volumes and updated AR lags) as the trial point for stage
- Pass as the incoming state to stage
The forward pass produces: (a) trial points at each stage for each trajectory, and (b) stage costs for upper bound estimation.
Parallelization: Forward trajectories are independent — Cobre distributes trajectories across MPI ranks, with threads solving individual stage LPs within each rank.
Warm-starting: The forward pass LP solution at stage provides a near-optimal basis for the backward pass solves at the same stage, significantly reducing solve times.
3.2 Backward Pass
Section titled “3.2 Backward Pass”The backward pass improves the value function approximation by generating new Benders cuts, walking stages in reverse order from down to .
At each stage , for each trial point collected during the forward pass:
- Solve the stage LP for every scenario (branching), using the trial state as incoming state
- From each LP solution, extract the optimal objective value and the reduced costs of the pinned incoming-state columns (incoming storage and AR lags). These reduced costs give the cut coefficients directly — no combination with FPHA or generic constraint duals is needed. See Cut Management
- Compute per-scenario cut coefficients from the duals and trial point
- Aggregate into a single cut via probability-weighted expectation — see Cut Management
- Add the aggregated cut to stage ‘s cut pool
The backward pass produces one new cut per stage per trial point per iteration.
Scenario tree — the forward pass samples M sparse paths through the branching tree, while the backward pass evaluates all N openings at each trial point and aggregates them into one Benders cut. The sparse-forward / exhaustive-backward asymmetry is why SDDP converges where full-tree evaluation is infeasible.
3.3 Convergence Monitoring
Section titled “3.3 Convergence Monitoring”Lower Bound: The risk-adjusted lower bound is computed by solving the stage-0 LP for every opening in the stage-0 scenario set, collecting the per-opening objectives, and aggregating them through the stage-0 risk measure:
where is the optimal objective of the stage-0 LP under opening with the current cut approximation at iteration , is the known initial state, and is the stage-0 risk measure (expected value under risk-neutral, or CVaR under risk-averse). The per-opening probabilities are uniform: .
Under risk-neutral settings (), this reduces to a simple average over all stage-0 opening objectives. Because cuts are added monotonically and never removed within a training run, this bound is non-decreasing across iterations — see Cut Management for the append-only guarantee.
Upper Bound: Estimated via Monte Carlo simulation over the forward pass trajectories (see Upper Bound Evaluation):
Optimality Gap:
3.4 Execution Model and Performance Considerations
Section titled “3.4 Execution Model and Performance Considerations”The SDDP iteration structure has specific properties that guide the parallelization strategy and solver lifecycle design. These are summarized here as architectural constraints.
Thread-trajectory affinity: The dominant parallelization strategy assigns each thread ownership of a complete forward trajectory. With threads (summed across all ranks), forward passes execute in parallel, each thread solving its trajectory’s stage LPs sequentially from to . The same thread that executed forward pass also performs the backward pass for the scenarios sampled by forward pass . This affinity pattern preserves cache locality (solver basis, scenario data, LP coefficients remain warm in the thread’s cache lines) and simplifies implementation by eliminating cross-thread data handoff.
Backward pass synchronization: The backward pass enforces a per-stage synchronization barrier: cut construction at stage completes before cuts at stage are computed. This serializes the algorithm at stage boundaries and is the dominant scaling factor for parallel speedup. Within a stage, each thread’s branching scenarios are solved sequentially, reusing the warm solver basis saved from the forward pass at that stage.
Forward pass state saving: When the number of forward passes exceeds the number of available threads , threads must process multiple trajectories in batches. This requires efficiently saving and restoring forward pass state (solver basis, scenario realization, visited states) at stage boundaries — analogous to CPU context switching for threads, but simpler because suspension only occurs at well-defined stage boundaries, not at arbitrary points.
LP rebuild cost: Memory constraints prevent keeping all stage LPs with their full cut sets resident simultaneously. The solver must rebuild LPs and add cut constraints when transitioning between stages, which lies on the critical performance path. The design minimizes this rebuild cost through strategies such as cut preallocation, basis persistence, and incremental constraint updates.
Reduced-cost cut extraction: Each state variable (incoming storage and inflow lag) is pinned by column bounds, and the reduced cost of that pinned column gives the cut coefficient directly — no preprocessing or dual combination is needed. FPHA hyperplane and generic constraint effects are captured automatically by the LP solver through the pinned column’s reduced cost. See Cut Management.
For implementation, see the cobre developer-guide.
4. Policy Graph Structure
Section titled “4. Policy Graph Structure”4.1 Finite Horizon (Acyclic Graph)
Section titled “4.1 Finite Horizon (Acyclic Graph)”The standard SDDP formulation uses an acyclic directed graph:
- Nodes: Stages
- Arcs: Transitions with probabilities (typically deterministic: )
- Terminal: (no future cost beyond stage )
4.2 Cyclic Graph (Infinite Horizon)
Section titled “4.2 Cyclic Graph (Infinite Horizon)”For long-term planning, Cobre supports infinite periodic horizon with cyclic graphs:
- Cycle: Stage transitions back to stage (or a cycle start)
- Discount: Cycle transitions require a discount factor for convergence
- Cut sharing: Cuts at equivalent cycle positions are shared
See Discount Rate for the discounted Bellman equation and Horizon Modes for the complete cyclic formulation.
5. State Variables and the Markov Property
Section titled “5. State Variables and the Markov Property”For SDDP to generate valid cuts, the subproblem must satisfy the Markov property: future costs depend only on the current state, not on the path by which the current state was reached.
State variables in Cobre:
| Component | Variable | Count | Description |
|---|---|---|---|
| Hydro storage | Reservoir volume at end of stage | ||
| AR inflow lags | Lagged inflows for AR(P) models |
5.1 AR Lag State Expansion
Section titled “5.1 AR Lag State Expansion”The PAR(p) inflow model requires past inflows to compute current inflow. To maintain the Markov property, these lags are included as state variables pinned by column bounds to the corresponding incoming state value:
where is the lag inflow value passed from the previous stage. The reduced costs of these pinned columns () contribute to cut coefficients, capturing the marginal value of inflow history — see Cut Management.
See PAR Inflow Model for the complete autoregressive formulation and LP Formulation for how these constraints appear in the stage LP.
6. Single-Cut Formulation
Section titled “6. Single-Cut Formulation”One aggregated cut per iteration:
where and .
- Pros: Fewer cuts, smaller LP, faster solves
- Cons: May require more iterations to converge
An alternative multi-cut formulation creates one cut per scenario per iteration, with per-scenario future cost variables . This formulation yields a tighter approximation per iteration at the cost of a larger LP. The trade-off analysis and convergence comparison are in Cut Management.
7. Terminal Boundary Cuts
Section titled “7. Terminal Boundary Cuts”A terminal boundary cut is a Benders cut on the terminal future-cost function that originates from an external Cobre run rather than from the current run’s backward pass. Terminal boundary cuts couple a downstream run’s policy with an upstream run’s value function at the time-horizon boundary.
What they represent: In a chained study, the upstream run builds a policy over a longer or earlier horizon. The future-cost function approximation at the upstream run’s terminal stage — a set of Benders cuts — captures the expected cost of continuing from any state at that stage boundary. When the downstream run imports these cuts, they serve as the initial approximation of the cost-to-go at the downstream run’s terminal stage .
The methodology guarantee: Imported terminal cuts are valid Benders cuts — they satisfy the same lower-bound and convexity conditions as any cut generated internally. Adding them to the initial cut set at stage is equivalent to seeding the outer approximation with information from the upstream run. The backward pass then generates additional cuts as usual, tightening the approximation from that informed starting point. The append-only property of the cut pool ensures the lower-bound estimate remains non-decreasing throughout training, whether cuts were imported or generated internally.
The use case: Chaining Cobre runs for studies that span different horizons or temporal resolutions — for example, a long-horizon strategic study feeding a shorter-horizon detailed study — without re-running the upstream model. The downstream run starts with an already-informed terminal cost, which typically reduces the number of iterations needed for the lower bound to reach the convergence threshold.
Trade-off: The imported cuts reflect the upstream run’s stochastic model and state discretization. If the downstream run uses a different inflow model, a different set of reservoirs, or a different penalty structure, the imported cuts may be inconsistent with the downstream subproblems. Inconsistencies do not cause infeasibility (the backward pass will generate corrective cuts) but they may slow convergence or produce a looser initial approximation than a consistent import would provide.
For the full treatment of chained studies and horizon coupling, see Part 5 — Coupling and Boundary Conditions.
Cross-References
Section titled “Cross-References”- Notation Conventions — All index sets, parameters, decision variables, and dual variable definitions
- LP Formulation — Complete stage subproblem LP that the forward/backward passes solve
- Cut Management — Cut generation, aggregation, selection, validity conditions, and the append-only monotonicity guarantee
- PAR Inflow Model — Stochastic inflow model driving uncertainty in the forward pass
- Discount Rate — Discounted Bellman equation, stage-dependent rates, discount factor on θ
- Horizon Modes — Finite vs cyclic policy graphs; cyclic-mode formal structure (season function, cycle convergence inequality, season-indexed cut pool, fixed-point Bellman operator)
- Upper Bound Evaluation — Upper bound estimation methods
- Stopping Rules — Convergence criteria that terminate the iterative process
- Risk Measures — CVaR and risk-averse extensions to the Bellman recursion
- Penalty System — Recourse slacks guaranteeing feasibility (relatively complete recourse)
- Equipment Formulations — Thermal equipment modelling
- Scenario Generation — Fixed opening tree, sampling scheme abstraction, complete tree mode