SDDP Algorithm

Purpose

This spec describes the Stochastic Dual Dynamic Programming (SDDP) algorithm as implemented in Cobre: the multistage stochastic formulation, the iterative forward/backward pass structure, convergence monitoring, policy graph topologies, state variable requirements, and the single-cut vs multi-cut trade-off. It serves as the algorithmic foundation referenced by all other mathematical specs.

For notation conventions (index sets, parameters, decision variables, dual variables), see Notation Conventions.

1. Problem Context

Cobre solves the hydrothermal dispatch problem: determining optimal generation schedules for hydro and thermal plants over a multi-year planning horizon under inflow uncertainty. Key characteristics:

Flexible for short or long horizons: 1 month - 5 years (daily, weekly or monthly stages)
Large state space: 160+ hydro reservoirs with AR inflow models $\approx$ 2000 state dimensions
Stochastic inflows: PAR(p) autoregressive models with seasonal patterns
Customize modeling complexity stagewise: Scenario generation, hydro production and others are configurable stagewise and elementwise

2. Multistage Stochastic Programming Formulation

The hydrothermal dispatch problem is formulated as a multistage stochastic program:

$x_{1}, \dots, x_{T} min E [t = 1 \sum T c_{t} (ω_{t})^{⊤} x_{t} (ω_{t})]$

subject to stage-linking constraints and uncertainty realization. The nested formulation uses value functions:

$V_{t} (x_{t - 1}) = E_{ω_{t}} [x_{t} min {c_{t}^{⊤} x_{t} + V_{t + 1} (x_{t}) : A_{t} x_{t} = b_{t} - E_{t} x_{t - 1}, x_{t} \in X_{t}}]$

with terminal condition $V_{T + 1} (x) = 0$ .

Key insight: The value function $V_{t} (x)$ is convex and piecewise linear (for LP subproblems), enabling outer approximation via Benders cuts.

Value function approximation via Benders cuts — each iteration adds a cut at a new trial point, tightening the outer approximation toward the true cost-to-go function

3. The SDDP Algorithm

SDDP iteratively builds piecewise-linear approximations $\hat{V}_{t}^{k}$ at iteration $k$ of the true value functions through:

Forward pass: Sample scenarios, make decisions using current approximation
Backward pass: Compute cuts to improve the approximation
Convergence check: Evaluate stopping criteria

flowchart TB
    FWD["<b>1. Forward Pass</b><br/><br/>Sample M scenario trajectories, stages 1…T<br/>Solve stage LPs under current cuts<br/>Record trial points, stage costs → UB<br/><i>MPI: allgatherv trial points + costs</i>"]
    BWD["<b>2. Backward Pass</b><br/><br/>For stages T → 1:<br/>Evaluate all N openings at each trial point<br/>Extract duals → aggregate 1 Benders cut<br/>Add cut to previous stage's LP<br/><i>MPI: allgatherv cuts per stage</i>"]
    CHK["<b>3. Stopping Rule Check</b><br/><br/>LB = ρ₀[Q₀(x₀, ω)]<br/>gap = (UB − LB) / max(1, |UB|)<br/>Check: gap ≤ tol · iter limit · stall"]
    CONV{"converged?"}
    DONE(["stopped"])

    FWD -->|trial points| BWD
    BWD -->|new cuts added| CHK
    CHK --> CONV
    CONV -->|no · k ← k+1| FWD
    CONV -->|yes| DONE

3.1 Forward Pass

The forward pass simulates the system under the current policy to generate trial points — the visited states that will be used by the backward pass to construct cuts.

For each of $M$ independent scenario trajectories:

Start from the known initial state $x_{0}$ (initial storage volumes and inflow history)
At each stage $t = 1, \dots, T$ , sample a scenario realization $ω_{t}$ from the stage’s scenario set, then solve the stage LP using the incoming state $\overset{x}{^}_{t - 1}$ and the sampled scenario. The LP includes all current Benders cuts as constraints on the future cost variable $θ$
Record the optimal state $\overset{x}{^}_{t}$ (end-of-stage storage volumes and updated AR lags) as the trial point for stage $t$
Pass $\overset{x}{^}_{t}$ as the incoming state to stage $t + 1$

Scenario sampling: “Sample a scenario realization $ω_{t}$ ” is controlled by the forward sampling scheme — a configurable abstraction that determines the forward pass noise source. The default scheme (InSample) draws a random index $j \in {0, \dots, N_{openings} - 1}$ from the fixed opening tree — a set of pre-generated noise vectors generated once before training begins. Alternative schemes (External, Historical) draw from user-provided data instead. See Scenario Generation §3 for the sampling scheme abstraction and Input Scenarios §2.1 for configuration.

The forward pass produces: (a) trial points ${\overset{x}{^}_{t}}$ at each stage for each trajectory, and (b) stage costs for upper bound estimation.

Parallelization: Forward trajectories are independent — Cobre distributes $M$ trajectories across MPI ranks, with OpenMP threads solving individual stage LPs within each rank.

Warm-starting: The forward pass LP solution at stage $t$ provides a near-optimal basis for the backward pass solves at the same stage, significantly reducing solve times.

3.2 Backward Pass

The backward pass improves the value function approximation by generating new Benders cuts, walking stages in reverse order from $T$ down to $1$ .

At each stage $t$ , for each trial point $\overset{x}{^}_{t - 1}$ collected during the forward pass:

Solve the stage $t$ LP for every scenario $ω \in Ω_{t}$ (branching), using the trial state $\overset{x}{^}_{t - 1}$ as incoming state
From each LP solution, extract the optimal objective value $Q_{t} (\overset{x}{^}_{t - 1}, ω)$ and the dual variables of the fixing constraints (storage fixing and AR lag fixing). The fixing constraint duals give the cut coefficients directly — no combination with FPHA or generic constraint duals is needed. See Cut Management §2
Compute per-scenario cut coefficients $(α (ω), π (ω))$ from the duals and trial point
Aggregate into a single cut via probability-weighted expectation — see Cut Management §3
Add the aggregated cut to stage $t - 1$ ’s cut pool

Backward branching: “Every scenario $ω \in Ω_{t}$ ” refers to all $N_{openings}$ noise vectors in the fixed opening tree for stage $t$ . This is the Complete backward sampling scheme — the backward pass evaluates ALL openings (the same set across all iterations), regardless of the forward pass noise source. A deferred MonteCarlo(n) variant would sample $n$ openings instead; see Deferred Features §C.14. The aggregation probabilities $p (ω)$ in Cut Management §3 are uniform over these openings ( $p (ω) = 1/ N_{openings}$ ). See Scenario Generation §3.4.

The backward pass produces one new cut per stage per trial point per iteration.

Feasibility: Every backward LP solve must be feasible. This is guaranteed by the recourse slack system (Category 1 penalties) — see Penalty System. The relatively complete recourse property ensures valid cuts; see Cut Management §4.

Discount factor: When discount rates are active, the discount factor is applied to the $θ$ variable in the stage $t - 1$ objective (i.e., $d_{t - 1 \to t} \cdot θ$ ), not to the cut coefficients. The cuts themselves remain unmodified. See Discount Rate.

Scenario tree — forward pass samples M sparse paths while backward pass evaluates all N openings at each trial point

3.3 Convergence Monitoring

Lower Bound: The risk-adjusted lower bound is computed by solving the stage-0 LP for every opening in the stage-0 scenario set, collecting the per-opening objectives, and aggregating them through the stage-0 risk measure:

$\underline{z}^{k} = ρ_{0} [Q_{0}^{k} (x_{0}, ω) ω \in Ω_{0}]$

where $Q_{0}^{k} (x_{0}, ω)$ is the optimal objective of the stage-0 LP under opening $ω$ with the current cut approximation at iteration $k$ , $x_{0}$ is the known initial state, and $ρ_{0}$ is the stage-0 risk measure (expected value under risk-neutral, or CVaR under risk-averse). The per-opening probabilities are uniform: $p (ω) = 1/∣ Ω_{0} ∣$ .

Under risk-neutral settings ( $ρ_{0} = E$ ), this reduces to a simple average over all stage-0 opening objectives. This bound increases monotonically as cuts are added.

Implementation note: The lower bound is evaluated after each backward pass and cut synchronization. Rank 0 iterates over all stage-0 openings, rebuilding the LP for each (load model, add cuts, patch forward state + noise, solve). The per-opening objectives are collected, aggregated through the risk measure, then broadcast to all ranks. The LP objective is in scaled cost space (divided by COST_SCALE_FACTOR); the result is multiplied by COST_SCALE_FACTOR to recover original units.

Upper Bound: Estimated via Monte Carlo simulation over the forward pass trajectories (see Upper Bound Evaluation):

$\overset{z}{ˉ}^{k} = \frac{1}{M} m = 1 \sum M t = 1 \sum T c_{t}^{⊤} x_{t}^{(m)}$

Optimality Gap:

$gap^{k} = \frac{z ˉ ^{k} - z ^{k}}{max ( 1 , ∣ z ˉ ^{k} ∣ )}$

3.4 Execution Model and Performance Considerations

The SDDP iteration structure has specific properties that guide the parallelization strategy and solver lifecycle design. These are summarized here as architectural constraints; detailed design is in the HPC and architecture specs.

Thread-trajectory affinity: The dominant parallelization strategy assigns each thread ownership of a complete forward trajectory. With $N$ threads (summed across all OpenMP threads of all MPI ranks), $N$ forward passes execute in parallel, each thread solving its trajectory’s stage LPs sequentially from $t = 1$ to $T$ . The same thread that executed forward pass $k$ also performs the backward pass for the scenarios sampled by forward pass $k$ . This affinity pattern preserves cache locality (solver basis, scenario data, LP coefficients remain warm in the thread’s cache lines) and simplifies implementation by eliminating cross-thread data handoff.

Backward pass synchronization: Unlike the forward pass (fully parallel), the backward pass has a hard synchronization barrier at each stage boundary: all threads must complete cut construction at stage $t$ before any thread proceeds to stage $t - 1$ . Within a stage, each thread solves its branching scenarios sequentially, reusing the warm solver basis saved from the forward pass at that stage. This sequential branching keeps the solver state hot and avoids redundant LP setup.

Forward pass state saving: When the number of forward passes $M$ exceeds the number of available threads $N$ , threads must process multiple trajectories in batches. This requires efficiently saving and restoring forward pass state (solver basis, scenario realization, visited states) at stage boundaries — analogous to CPU context switching for threads, but simpler because suspension only occurs at well-defined stage boundaries, not at arbitrary points.

LP rebuild cost: Memory constraints prevent keeping all stage LPs with their full cut sets resident simultaneously. The solver must rebuild LPs and add cut constraints when transitioning between stages, which lies on the critical performance path. The design must minimize this rebuild cost through strategies such as cut preallocation, basis persistence, and incremental constraint updates. See Solver Abstraction and Solver Workspaces.

Fixing constraint dual extraction: Each state variable (storage and inflow lag) has a dedicated fixing constraint whose dual gives the cut coefficient directly — no preprocessing or dual combination is needed. FPHA hyperplane and generic constraint effects are captured automatically by the LP solver through the fixing constraint dual. See Cut Management §2 and Cut Management Implementation SS5.

4. Policy Graph Structure

4.1 Finite Horizon (Acyclic Graph)

The standard SDDP formulation uses an acyclic directed graph:

Nodes: Stages $t \in {1, \dots, T}$
Arcs: Transitions with probabilities (typically deterministic: $p = 1$ )
Terminal: $V_{T + 1} (x) = 0$ (no future cost beyond stage $T$ )

4.2 Cyclic Graph (Infinite Horizon)

For long-term planning, Cobre supports infinite periodic horizon with cyclic graphs:

Cycle: Stage $T$ transitions back to stage $1$ (or a cycle start)
Discount: Cycle transitions require a discount factor $d < 1$ for convergence
Cut sharing: Cuts at equivalent cycle positions are shared

Symbol note: All specs use $d$ for the discount factor. The deficit variable uses $δ$ (lowercase delta), so there is no conflict.

See Discount Rate for the discounted Bellman equation and Infinite Horizon for the complete cyclic formulation.

5. State Variables and the Markov Property

For SDDP to generate valid cuts, the subproblem must satisfy the Markov property: future costs depend only on the current state, not on how we arrived at that state.

State variables in Cobre:

Component	Variable	Count	Description
Hydro storage	$v_{h}$	$N_{h y d ro}$	Reservoir volume at end of stage
AR inflow lags	$a_{h, ℓ}$	$\sum_{h} P_{h}$	Lagged inflows for AR(P) models
Battery SOC	$so c_{ba t}$	$N_{ba tt ery}$	Battery state of charge (DEFERRED)
GNL pipeline	$g n l_{t, ℓ}$	$\sum_{g n l} L_{g n l}$	Committed GNL dispatch (DEFERRED)

Note on deferred state variables: Battery SOC and GNL pipeline state are planned extensions — see Deferred Features. For GNL thermals specifically, the data model already accepts GNL configurations but validation rejects them until the solver implementation is ready — see Equipment Formulations §1.2.

5.1 AR Lag State Expansion

The PAR(p) inflow model requires past inflows $a_{h, t - 1}, a_{h, t - 2}, \dots$ to compute current inflow. To maintain the Markov property, these lags are included as state variables with fixing constraints that bind each lag variable to the corresponding incoming state value:

$a_{h, ℓ} = \overset{a}{^}_{h, ℓ} \forall h \in H, ℓ \in {1, \dots, P_{h}}$

where $\overset{a}{^}_{h, ℓ}$ is the lag $ℓ$ inflow value passed from the previous stage. The duals of these fixing constraints ( $π_{h, ℓ}^{l a g}$ ) contribute to cut coefficients, capturing the marginal value of inflow history — see Cut Management §2.

See PAR Inflow Model for the complete autoregressive formulation and LP Formulation §5 for how these constraints appear in the stage LP. For the full LP column and row layout (including auxiliary z-inflow variables between lags and incoming storage), see LP Formulation §4b.

6. Single-Cut vs Multi-Cut Formulation

6.1 Single-Cut (Default)

One aggregated cut per iteration:

$θ_{t - 1} \geq \overset{α}{ˉ} + \overset{π}{ˉ}^{⊤} x_{t - 1}$

where $\overset{α}{ˉ} = E [α (ω)]$ and $\overset{π}{ˉ} = E [π (ω)]$ .

Pros: Fewer cuts, smaller LP, faster solves
Cons: May require more iterations to converge

6.2 Multi-Cut (DEFERRED)

One cut per scenario per iteration:

$θ_{t - 1, ω} \geq α (ω) + π (ω)^{⊤} x_{t - 1} \forall ω \in Ω_{t}$

Pros: Tighter approximation, fewer iterations
Cons: More cuts, larger LP, memory-intensive

Cobre implements single-cut by default. Multi-cut is planned for future implementation. See Deferred Features for the full trade-off analysis.

Cross-References

Notation Conventions — All index sets, parameters, decision variables, and dual variable definitions
LP Formulation — Complete stage subproblem LP that the forward/backward passes solve
Cut Management — Cut generation, aggregation, selection, and validity conditions
PAR Inflow Model — Stochastic inflow model driving uncertainty in the forward pass
Discount Rate — Discounted Bellman equation, stage-dependent rates, discount factor on θ
Infinite Horizon — Periodic structure, cycle detection, cut sharing, convergence
Upper Bound Evaluation — Upper bound estimation methods
Stopping Rules — Convergence criteria that terminate the iterative process
Risk Measures — CVaR and risk-averse extensions to the Bellman recursion
Penalty System — Recourse slacks guaranteeing feasibility (relatively complete recourse)
Equipment Formulations — GNL thermal validation-rejection rule
Scenario Generation — Fixed opening tree (§2.3), sampling scheme abstraction (§3), external scenario integration (§4), complete tree mode (§7)
Deferred Features — Multi-cut formulation, Markovian policy graphs, batteries, user-supplied noise openings (C.11), complete tree solver integration (C.12)
Production Scale Reference — Typical problem sizes and state dimensions

Keyboard shortcuts

Cobre Methodology Reference