Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

SDDP Algorithm

Purpose

This spec describes the Stochastic Dual Dynamic Programming (SDDP) algorithm as implemented in Cobre: the multistage stochastic formulation, the iterative forward/backward pass structure, convergence monitoring, policy graph topologies, state variable requirements, and the single-cut vs multi-cut trade-off. It serves as the algorithmic foundation referenced by all other mathematical specs.

For notation conventions (index sets, parameters, decision variables, dual variables), see Notation Conventions.

1. Problem Context

Cobre solves the hydrothermal dispatch problem: determining optimal generation schedules for hydro and thermal plants over a multi-year planning horizon under inflow uncertainty. Key characteristics:

  • Flexible for short or long horizons: 1 month - 5 years (daily, weekly or monthly stages)
  • Large state space: 160+ hydro reservoirs with AR inflow models 2000 state dimensions
  • Stochastic inflows: PAR(p) autoregressive models with seasonal patterns
  • Customize modeling complexity stagewise: Scenario generation, hydro production and others are configurable stagewise and elementwise

2. Multistage Stochastic Programming Formulation

The hydrothermal dispatch problem is formulated as a multistage stochastic program:

subject to stage-linking constraints and uncertainty realization. The nested formulation uses value functions:

with terminal condition .

Key insight: The value function is convex and piecewise linear (for LP subproblems), enabling outer approximation via Benders cuts.

Value function approximation via Benders cuts — each iteration adds a cut at a new trial point, tightening the outer approximation toward the true cost-to-go function

3. The SDDP Algorithm

SDDP iteratively builds piecewise-linear approximations at iteration of the true value functions through:

  1. Forward pass: Sample scenarios, make decisions using current approximation
  2. Backward pass: Compute cuts to improve the approximation
  3. Convergence check: Evaluate stopping criteria
flowchart TB
    FWD["<b>1. Forward Pass</b><br/><br/>Sample M scenario trajectories, stages 1…T<br/>Solve stage LPs under current cuts<br/>Record trial points, stage costs → UB<br/><i>MPI: allgatherv trial points + costs</i>"]
    BWD["<b>2. Backward Pass</b><br/><br/>For stages T → 1:<br/>Evaluate all N openings at each trial point<br/>Extract duals → aggregate 1 Benders cut<br/>Add cut to previous stage's LP<br/><i>MPI: allgatherv cuts per stage</i>"]
    CHK["<b>3. Stopping Rule Check</b><br/><br/>LB = ρ₀[Q₀(x₀, ω)]<br/>gap = (UB − LB) / max(1, |UB|)<br/>Check: gap ≤ tol · iter limit · stall"]
    CONV{"converged?"}
    DONE(["stopped"])

    FWD -->|trial points| BWD
    BWD -->|new cuts added| CHK
    CHK --> CONV
    CONV -->|no · k ← k+1| FWD
    CONV -->|yes| DONE

3.1 Forward Pass

The forward pass simulates the system under the current policy to generate trial points — the visited states that will be used by the backward pass to construct cuts.

For each of independent scenario trajectories:

  1. Start from the known initial state (initial storage volumes and inflow history)
  2. At each stage , sample a scenario realization from the stage’s scenario set, then solve the stage LP using the incoming state and the sampled scenario. The LP includes all current Benders cuts as constraints on the future cost variable
  3. Record the optimal state (end-of-stage storage volumes and updated AR lags) as the trial point for stage
  4. Pass as the incoming state to stage

Scenario sampling: “Sample a scenario realization ” is controlled by the forward sampling scheme — a configurable abstraction that determines the forward pass noise source. The default scheme (InSample) draws a random index from the fixed opening tree — a set of pre-generated noise vectors generated once before training begins. Alternative schemes (External, Historical) draw from user-provided data instead. See Scenario Generation §3 for the sampling scheme abstraction and Input Scenarios §2.1 for configuration.

The forward pass produces: (a) trial points at each stage for each trajectory, and (b) stage costs for upper bound estimation.

Parallelization: Forward trajectories are independent — Cobre distributes trajectories across MPI ranks, with OpenMP threads solving individual stage LPs within each rank.

Warm-starting: The forward pass LP solution at stage provides a near-optimal basis for the backward pass solves at the same stage, significantly reducing solve times.

3.2 Backward Pass

The backward pass improves the value function approximation by generating new Benders cuts, walking stages in reverse order from down to .

At each stage , for each trial point collected during the forward pass:

  1. Solve the stage LP for every scenario (branching), using the trial state as incoming state
  2. From each LP solution, extract the optimal objective value and the dual variables of the fixing constraints (storage fixing and AR lag fixing). The fixing constraint duals give the cut coefficients directly — no combination with FPHA or generic constraint duals is needed. See Cut Management §2
  3. Compute per-scenario cut coefficients from the duals and trial point
  4. Aggregate into a single cut via probability-weighted expectation — see Cut Management §3
  5. Add the aggregated cut to stage ’s cut pool

Backward branching: “Every scenario ” refers to all noise vectors in the fixed opening tree for stage . This is the Complete backward sampling scheme — the backward pass evaluates ALL openings (the same set across all iterations), regardless of the forward pass noise source. A deferred MonteCarlo(n) variant would sample openings instead; see Deferred Features §C.14. The aggregation probabilities in Cut Management §3 are uniform over these openings (). See Scenario Generation §3.4.

The backward pass produces one new cut per stage per trial point per iteration.

Feasibility: Every backward LP solve must be feasible. This is guaranteed by the recourse slack system (Category 1 penalties) — see Penalty System. The relatively complete recourse property ensures valid cuts; see Cut Management §4.

Discount factor: When discount rates are active, the discount factor is applied to the variable in the stage objective (i.e., ), not to the cut coefficients. The cuts themselves remain unmodified. See Discount Rate.

Scenario tree — forward pass samples M sparse paths while backward pass evaluates all N openings at each trial point

3.3 Convergence Monitoring

Lower Bound: The risk-adjusted lower bound is computed by solving the stage-0 LP for every opening in the stage-0 scenario set, collecting the per-opening objectives, and aggregating them through the stage-0 risk measure:

where is the optimal objective of the stage-0 LP under opening with the current cut approximation at iteration , is the known initial state, and is the stage-0 risk measure (expected value under risk-neutral, or CVaR under risk-averse). The per-opening probabilities are uniform: .

Under risk-neutral settings (), this reduces to a simple average over all stage-0 opening objectives. This bound increases monotonically as cuts are added.

Implementation note: The lower bound is evaluated after each backward pass and cut synchronization. Rank 0 iterates over all stage-0 openings, rebuilding the LP for each (load model, add cuts, patch forward state + noise, solve). The per-opening objectives are collected, aggregated through the risk measure, then broadcast to all ranks. The LP objective is in scaled cost space (divided by COST_SCALE_FACTOR); the result is multiplied by COST_SCALE_FACTOR to recover original units.

Upper Bound: Estimated via Monte Carlo simulation over the forward pass trajectories (see Upper Bound Evaluation):

Optimality Gap:

3.4 Execution Model and Performance Considerations

The SDDP iteration structure has specific properties that guide the parallelization strategy and solver lifecycle design. These are summarized here as architectural constraints; detailed design is in the HPC and architecture specs.

Thread-trajectory affinity: The dominant parallelization strategy assigns each thread ownership of a complete forward trajectory. With threads (summed across all OpenMP threads of all MPI ranks), forward passes execute in parallel, each thread solving its trajectory’s stage LPs sequentially from to . The same thread that executed forward pass also performs the backward pass for the scenarios sampled by forward pass . This affinity pattern preserves cache locality (solver basis, scenario data, LP coefficients remain warm in the thread’s cache lines) and simplifies implementation by eliminating cross-thread data handoff.

Backward pass synchronization: Unlike the forward pass (fully parallel), the backward pass has a hard synchronization barrier at each stage boundary: all threads must complete cut construction at stage before any thread proceeds to stage . Within a stage, each thread solves its branching scenarios sequentially, reusing the warm solver basis saved from the forward pass at that stage. This sequential branching keeps the solver state hot and avoids redundant LP setup.

Forward pass state saving: When the number of forward passes exceeds the number of available threads , threads must process multiple trajectories in batches. This requires efficiently saving and restoring forward pass state (solver basis, scenario realization, visited states) at stage boundaries — analogous to CPU context switching for threads, but simpler because suspension only occurs at well-defined stage boundaries, not at arbitrary points.

LP rebuild cost: Memory constraints prevent keeping all stage LPs with their full cut sets resident simultaneously. The solver must rebuild LPs and add cut constraints when transitioning between stages, which lies on the critical performance path. The design must minimize this rebuild cost through strategies such as cut preallocation, basis persistence, and incremental constraint updates. See Solver Abstraction and Solver Workspaces.

Fixing constraint dual extraction: Each state variable (storage and inflow lag) has a dedicated fixing constraint whose dual gives the cut coefficient directly — no preprocessing or dual combination is needed. FPHA hyperplane and generic constraint effects are captured automatically by the LP solver through the fixing constraint dual. See Cut Management §2 and Cut Management Implementation SS5.

4. Policy Graph Structure

4.1 Finite Horizon (Acyclic Graph)

The standard SDDP formulation uses an acyclic directed graph:

  • Nodes: Stages
  • Arcs: Transitions with probabilities (typically deterministic: )
  • Terminal: (no future cost beyond stage )

4.2 Cyclic Graph (Infinite Horizon)

For long-term planning, Cobre supports infinite periodic horizon with cyclic graphs:

  • Cycle: Stage transitions back to stage (or a cycle start)
  • Discount: Cycle transitions require a discount factor for convergence
  • Cut sharing: Cuts at equivalent cycle positions are shared

Symbol note: All specs use for the discount factor. The deficit variable uses (lowercase delta), so there is no conflict.

See Discount Rate for the discounted Bellman equation and Infinite Horizon for the complete cyclic formulation.

5. State Variables and the Markov Property

For SDDP to generate valid cuts, the subproblem must satisfy the Markov property: future costs depend only on the current state, not on how we arrived at that state.

State variables in Cobre:

ComponentVariableCountDescription
Hydro storageReservoir volume at end of stage
AR inflow lagsLagged inflows for AR(P) models
Battery SOCBattery state of charge (DEFERRED)
GNL pipelineCommitted GNL dispatch (DEFERRED)

Note on deferred state variables: Battery SOC and GNL pipeline state are planned extensions — see Deferred Features. For GNL thermals specifically, the data model already accepts GNL configurations but validation rejects them until the solver implementation is ready — see Equipment Formulations §1.2.

5.1 AR Lag State Expansion

The PAR(p) inflow model requires past inflows to compute current inflow. To maintain the Markov property, these lags are included as state variables with fixing constraints that bind each lag variable to the corresponding incoming state value:

where is the lag inflow value passed from the previous stage. The duals of these fixing constraints () contribute to cut coefficients, capturing the marginal value of inflow history — see Cut Management §2.

See PAR Inflow Model for the complete autoregressive formulation and LP Formulation §5 for how these constraints appear in the stage LP. For the full LP column and row layout (including auxiliary z-inflow variables between lags and incoming storage), see LP Formulation §4b.

6. Single-Cut vs Multi-Cut Formulation

6.1 Single-Cut (Default)

One aggregated cut per iteration:

where and .

  • Pros: Fewer cuts, smaller LP, faster solves
  • Cons: May require more iterations to converge

6.2 Multi-Cut (DEFERRED)

One cut per scenario per iteration:

  • Pros: Tighter approximation, fewer iterations
  • Cons: More cuts, larger LP, memory-intensive

Cobre implements single-cut by default. Multi-cut is planned for future implementation. See Deferred Features for the full trade-off analysis.

Cross-References

  • Notation Conventions — All index sets, parameters, decision variables, and dual variable definitions
  • LP Formulation — Complete stage subproblem LP that the forward/backward passes solve
  • Cut Management — Cut generation, aggregation, selection, and validity conditions
  • PAR Inflow Model — Stochastic inflow model driving uncertainty in the forward pass
  • Discount Rate — Discounted Bellman equation, stage-dependent rates, discount factor on θ
  • Infinite Horizon — Periodic structure, cycle detection, cut sharing, convergence
  • Upper Bound Evaluation — Upper bound estimation methods
  • Stopping Rules — Convergence criteria that terminate the iterative process
  • Risk Measures — CVaR and risk-averse extensions to the Bellman recursion
  • Penalty System — Recourse slacks guaranteeing feasibility (relatively complete recourse)
  • Equipment Formulations — GNL thermal validation-rejection rule
  • Scenario Generation — Fixed opening tree (§2.3), sampling scheme abstraction (§3), external scenario integration (§4), complete tree mode (§7)
  • Deferred Features — Multi-cut formulation, Markovian policy graphs, batteries, user-supplied noise openings (C.11), complete tree solver integration (C.12)
  • Production Scale Reference — Typical problem sizes and state dimensions