Skip to content

Toy Single-Reservoir SDDP Walkthrough

This chapter walks through one complete SDDP iteration on a deliberately small system — one reservoir, one thermal unit, one demand block, four stages, and a 0-order (pure seasonal-sampling) inflow model — using numbers chosen so every cut coefficient and every dual variable can be verified by hand. The goal is to make the abstract machinery of forward pass, backward pass, cut construction, and lower-bound update tangible.

Cobre ships an actual reference case at examples/1dtoy/ that a curious reader can run end-to-end (cobre run examples/1dtoy). That case has two thermals, eight monthly stages, and ten openings per stage; the chapter here uses simpler numbers — one thermal, four stages, three openings — chosen for tractability. This walkthrough is a pedagogical caricature, not a reproduction of the shipped case.

The chapter does not explain how the underlying mechanisms work; it shows them working. It does not cover multi-reservoir effects, spatial inflow correlation, the FPHA production model, risk-measure weighting, or autoregressive inflow memory. Those topics belong to the chapters cited in section 9 and to the multi-reservoir companion in Toy Four-Reservoir Walkthrough.


The system has one bus, one reservoir, one thermal unit, and one demand block. The reservoir receives a stochastic inflow drawn from a 0-order seasonal-sampling model; the thermal unit and a deficit slack cover any shortfall that the reservoir cannot serve.

graph LR
    INF["Inflow a_t<br/>(0-order)"]
    RES["Reservoir<br/>v  cap 100"]
    GEN["Hydro q"]
    TH["Thermal g_th<br/>cost 50"]
    DEF["Deficit δ<br/>cost 1000"]
    BUS["Bus<br/>demand D = 40"]

    INF --> RES
    RES --> GEN --> BUS
    TH --> BUS
    DEF --> BUS

Case parameters (chosen for hand-verifiability):

ParameterSymbolValue
StagesTT4
Inflow openings per stageNN3
Initial storagev0v_030 (storage units)
Reservoir capacityVˉ\bar{V}100
Demand per stageDD40
Inflow seasonal meanμ\mu30
Inflow seasonal std deviationσ\sigma10
Thermal marginal costcthc^{th}50
Deficit costcdefc^{def}1000
Discount factordd1.0

Demand is set above the mean inflow (D=40>μ=30D = 40 > \mu = 30) so the reservoir depletes over the four stages, eventually forcing thermal dispatch and producing non-trivial dual variables in the backward pass.


Each stage tt solves the following LP (specialised from the general formulation in LP Formulation to one hydro, one thermal, one demand block, and a 0-order inflow):

Objective:

min50gth+1000δ+θ\min \quad 50\, g_{th} + 1000\, \delta + \theta

Load balance:

q+gth+δ=40q + g_{th} + \delta = 40

Water balance:

v=vin+aqv = v^{in} + a - q

Incoming-storage pinning (binds incoming storage to the trial value by equal column bounds; the pinned column’s reduced cost becomes the cut coefficient):

vin=vˉin=v^t1\underline{v}^{in} = \bar{v}^{in} = \hat{v}_{t-1}

Inflow (treated as data, not a state variable; see section 3):

a=at(ω)(realised from the 0-order sampler)a = a_t(\omega) \quad \text{(realised from the 0-order sampler)}

Bounds:

0v100,q0,gth0,δ0,θ00 \leq v \leq 100, \quad q \geq 0, \quad g_{th} \geq 0, \quad \delta \geq 0, \quad \theta \geq 0

Future cost variable θ\theta: in the terminal stage 4 there are no cuts, and θ\theta is bound to zero. As the backward pass runs, cuts of the form θα+πvv\theta \geq \alpha + \pi^v\, v are added to earlier stages’ LPs. Because the value function V(v)V(v) is decreasing in storage (more water means lower future cost), the cut slope πv\pi^v is negative.

Note the absence of any AR-lag state variable: the 0-order inflow has no memory across stages, so storage is the only state.


The inflow at every stage is sampled independently from a normal distribution with stage-seasonal mean and standard deviation:

at=μ+σεt,εtN(0,1)a_t = \mu + \sigma\, \varepsilon_t, \qquad \varepsilon_t \sim \mathcal{N}(0, 1)

with μ=30\mu = 30 and σ=10\sigma = 10 for every stage in this walkthrough. The draws across stages are independent. This is the degenerate p=0p = 0 case of the PAR Inflow Model: no lag terms, no AR coefficients, no initial-lag state. The data file inflow_seasonal_stats.parquet carries only μt\mu_t and σt\sigma_t per hydro and per stage, and the inflow_ar_coefficients.parquet file is absent — the order-selection procedure landed at pm=0p_m = 0 for every season (the white-noise case described in section 4 of PAR Inflow Model).

The three openings used in the backward pass correspond to ε{1,0,+1}\varepsilon \in \{-1, 0, +1\} with equal probability p=1/3p = 1/3, giving inflows at{20,30,40}a_t \in \{20, 30, 40\}. Because the openings are identical across stages and there is no AR memory, the same three-point distribution applies at every stage.


The forward pass samples one trajectory using εt=0\varepsilon_t = 0 at every stage. Under zero noise the inflow stays at the seasonal mean throughout: at=30a_t = 30 for t=1,2,3,4t = 1, 2, 3, 4.

At each stage, the LP minimises 50gth+1000δ+θ50\, g_{th} + 1000\, \delta + \theta with θ\theta free at zero (no cuts in iteration 1). Pure hydro dispatch dominates whenever water is available.

Stage 1 — incoming storage v^0=30\hat{v}_0 = 30, inflow a1=30a_1 = 30. Available water: 30+30=604030 + 30 = 60 \geq 40. Optimal: q=40q = 40, gth=0g_{th} = 0, δ=0\delta = 0. End storage v1=30+3040=20v_1 = 30 + 30 - 40 = 20. Stage cost: 00.

Stage 2v^1=20\hat{v}_1 = 20, a2=30a_2 = 30. Available: 504050 \geq 40. q=40q = 40, gth=0g_{th} = 0, v2=10v_2 = 10. Stage cost: 00.

Stage 3v^2=10\hat{v}_2 = 10, a3=30a_3 = 30. Available: 40=4040 = 40. q=40q = 40, gth=0g_{th} = 0, v3=0v_3 = 0. Stage cost: 00.

Stage 4v^3=0\hat{v}_3 = 0, a4=30a_4 = 30. Available: 30<4030 < 40. The LP sets q=30q = 30 (all water turbined), gth=10g_{th} = 10 (thermal fills the gap), δ=0\delta = 0, v4=0v_4 = 0. Stage cost: 50×10=50050 \times 10 = 500.

Forward-pass summary:

Stagev^t1\hat{v}_{t-1}ata_tqqgthg_{th}vtv_tStage cost
13030400200
22030400100
3103040000
403030100500

Upper-bound estimate from this trajectory: 0+0+0+500=5000 + 0 + 0 + 500 = 500. This is one realisation of total cost; the statistical upper bound is the average over many simulated trajectories.


The backward pass walks stages 414 \to 1. At each stage it fixes the incoming storage to the trial point from the forward pass, evaluates all three openings, reads the reduced cost of the pinned incoming-storage column, computes per-opening intercepts, and aggregates into one cut (see Cut Management sections 2–3).

Trial point: v^3=0\hat{v}_3 = 0. Inflows under the three openings: a4(ω){20,30,40}a_4(\omega) \in \{20, 30, 40\}.

Openingε\varepsilona4a_4Available v^ ⁣+a4\hat{v}\!+a_4qqgthg_{th}Q4Q_4
ω1\omega_11-1202020201000
ω2\omega_20030303010500
ω3\omega_3+1+140404000

Storage cut coefficient π4v(ω)=Q4/v^3\pi^v_4(\omega) = \partial Q_4/\partial \hat{v}_3, the reduced cost of the pinned incoming-storage column. By the LP envelope theorem applied to the pinned bound v4in=v^3v^{in}_4 = \hat{v}_3:

  • For ω1\omega_1 and ω2\omega_2 (water-limited, thermal active): one extra unit of v^3\hat{v}_3 enables one extra unit of turbining, displacing one unit of thermal worth 5050. Optimal cost falls by 5050, so π4v(ω)=50\pi^v_4(\omega) = -50.
  • For ω3\omega_3 (demand met exactly by hydro alone): the LP already has q=40=Dq = 40 = D; extra storage flows into terminal v4v_4, which has zero value at the terminal stage. π4v(ω3)=0\pi^v_4(\omega_3) = 0.
Openingπ4v\pi^v_4
ω1\omega_150-50
ω2\omega_250-50
ω3\omega_300

Per-opening intercepts (anchoring each cut at the trial point):

α^4(ω)=Q4(ω)π4v(ω)v^3\hat{\alpha}_4(\omega) = Q_4(\omega) - \pi^v_4(\omega)\,\hat{v}_3

With v^3=0\hat{v}_3 = 0 this simplifies to α^4(ω)=Q4(ω)\hat{\alpha}_4(\omega) = Q_4(\omega): α^4=(1000,500,0)\hat{\alpha}_4 = (1000,\, 500,\, 0).

Single-cut aggregation (uniform probability p=1/3p = 1/3; see Cut Management section 3):

αˉ=13(1000+500+0)=500,πˉv=13(5050+0)=1003\bar{\alpha} = \tfrac{1}{3}(1000 + 500 + 0) = 500, \qquad \bar{\pi}^v = \tfrac{1}{3}(-50 - 50 + 0) = -\tfrac{100}{3}

Cut added to stage 3’s LP:

θ    500    1003v\theta \;\geq\; 500 \;-\; \tfrac{100}{3}\, v

Sanity check. At v=0v = 0 the cut gives θ500\theta \geq 500, matching the expected stage-4 cost from the table above. At v=15v = 15 the cut gives θ0\theta \geq 0, after which the implicit θ0\theta \geq 0 bound takes over.

Trial point: v^2=10\hat{v}_2 = 10.

The stage-3 LP now carries the cut θ500(100/3)v3\theta \geq 500 - (100/3)\, v_3. For each opening a3{20,30,40}a_3 \in \{20, 30, 40\} the optimiser balances spending on thermal now against keeping more water for the future (worth 100/3100/3 per unit via the cut).

Openinga3a_3Availableqqgthg_{th}v3v_3θ\theta^*Q3Q_3
ω1\omega_120303010050050010001000
ω2\omega_230404000500500500500
ω3\omega_3405040010500/3500/3500/3500/3

For ω1\omega_1 the system is water-limited; the LP turbines all 30 units of available water and dispatches 10 MW of thermal, ending with v3=0v_3 = 0 and the cut binding at 500500.

For ω2\omega_2 available water exactly meets demand; no thermal, end v3=0v_3 = 0, cut value 500500.

For ω3\omega_3 the optimiser must choose how much of the surplus to release. Decreasing qq raises gthg_{th} by one (cost 5050) and raises v3v_3 by one (saves 100/3100/3 on the cut); the marginal cost of holding more is +50100/3+16.7+50 - 100/3 \approx +16.7, so the optimiser pushes qq to its load-balance bound at 4040, leaving v3=10v_3 = 10 and the cut value at 500/3167500/3 \approx 167.

Storage cut coefficients π3v(ω)=Q3/v^2\pi^v_3(\omega) = \partial Q_3/\partial \hat{v}_2 (reduced costs of the pinned incoming-storage column):

  • ω1\omega_1 (water-limited): one extra unit of v^2\hat{v}_2 frees one extra turbine unit, saves 5050 of thermal. π3v(ω1)=50\pi^v_3(\omega_1) = -50.
  • ω2\omega_2 (demand exactly met): the LP is at a kink; both πv=50\pi^v = -50 (water-limited regime) and πv=100/3\pi^v = -100/3 (storage-flow regime) are valid subgradients. The walkthrough takes the basis returning π3v(ω2)=50\pi^v_3(\omega_2) = -50.
  • ω3\omega_3 (water surplus, holding storage): one extra unit of v^2\hat{v}_2 raises v3v_3 by one, lowers the cut value by 100/3100/3. π3v(ω3)=100/3\pi^v_3(\omega_3) = -100/3.

Per-opening intercepts α^3(ω)=Q3(ω)π3v(ω)v^2\hat{\alpha}_3(\omega) = Q_3(\omega) - \pi^v_3(\omega)\,\hat{v}_2:

OpeningQ3Q_3π3vv^2\pi^v_3 \cdot \hat{v}_2α^3\hat{\alpha}_3
ω1\omega_110001000500-50015001500
ω2\omega_2500500500-50010001000
ω3\omega_3500/3500/31000/3-1000/3500500

Aggregation (p=1/3p = 1/3):

αˉ=13(1500+1000+500)=1000,πˉv=13 ⁣(50501003)=4009\bar{\alpha} = \tfrac{1}{3}(1500 + 1000 + 500) = 1000, \qquad \bar{\pi}^v = \tfrac{1}{3}\!\left(-50 - 50 - \tfrac{100}{3}\right) = -\tfrac{400}{9}

Cut added to stage 2’s LP:

θ    1000    4009v\theta \;\geq\; 1000 \;-\; \tfrac{400}{9}\, v

Sanity check. At v=v^2=10v = \hat{v}_2 = 10 the cut evaluates to 10004000/9555.61000 - 4000/9 \approx 555.6, matching the probability-weighted expectation Qˉ3=(1/3)(1000+500+500/3)=5000/9555.6\bar{Q}_3 = (1/3)(1000 + 500 + 500/3) = 5000/9 \approx 555.6.

The same procedure repeats at stages 2 and 1. At each stage:

  1. Pin the incoming storage to the trial point from the forward pass (column bounds).
  2. Solve all three opening LPs, including the cut from the next stage.
  3. Read the reduced cost of the pinned incoming-storage column.
  4. Compute per-opening intercepts via α^(ω)=Q(ω)πv(ω)v^t1\hat{\alpha}(\omega) = Q(\omega) - \pi^v(\omega)\,\hat{v}_{t-1}.
  5. Aggregate by probability-weighted averaging.
  6. Add the cut to the previous stage’s LP.

By the end of the backward pass, stages 1 through 3 each carry one cut. Stage 4 has none (it is the terminal stage and has no future to approximate).


After the backward pass installs a cut at stage 1, the lower bound for iteration 1 is computed by solving the stage-1 LP for every opening with the cut active and taking the probability-weighted expectation (see SDDP Algorithm section 3.3):

z1=Eω[Q11(x0,ω)]\underline{z}^1 = \mathbb{E}_{\omega}\bigl[\, Q_1^1(x_0, \omega) \,\bigr]

where Q11Q_1^1 is the stage-1 optimal objective under opening ω\omega with the iteration-1 cut in place, and x0=v0=30x_0 = v_0 = 30 is the fixed initial storage. The lower bound rises from 00 (iteration 0, no cuts) to a positive value once the first cut is installed; the value reflects the policy’s expected cost given the partial information encoded in the single iteration-1 cut at each stage.

The lower bound is non-decreasing across iterations — adding cuts only tightens the outer approximation, never loosening it (see Cut Management section 1).


7. The Future Cost Function Across Iterations

Section titled “7. The Future Cost Function Across Iterations”

After iteration 1 each non-terminal stage holds one Benders cut. As the algorithm continues:

  • Iteration 2 samples a different forward trajectory (different ε\varepsilon draws), visits different trial points, and adds a second cut at each non-terminal stage.
  • Subsequent iterations each add one cut per non-terminal stage. The outer approximation tightens around the true convex future-cost function (FCF) — each new cut is tangent at a new trial point, ruling out overestimates in that region.

For this case the FCF at each stage is a univariate function of storage vv. Each cut is a line in this one-dimensional state space, and the outer approximation is the pointwise maximum over all cuts:

V^tk(v)  =  maxi=1,,k{αˉi+πˉv,iv}.\hat{V}_t^k(v) \;=\; \max_{i = 1, \ldots, k} \bigl\{ \bar{\alpha}^i + \bar{\pi}^{v,i}\, v \bigr\}.

Because VtV_t is decreasing in storage (more water means lower future cost) and convex, the slopes πˉv,i\bar{\pi}^{v,i} are negative and the approximation is a lower envelope of decreasing lines. As kk grows, visited trial points spread across the state space — low-storage trajectories force the algorithm to evaluate cut quality in the scarcity region, adding cuts that are informative for water-stressed scenarios.


As iterations accumulate, the lower bound zk\underline{z}^k rises and the simulation-based upper-bound estimate zˉk\bar{z}^k (the mean over many forward trajectories) converges. The relative gap

gapk=zˉkzkmax(1,zˉk)\text{gap}^k = \frac{\bar{z}^k - \underline{z}^k}{\max(1, |\bar{z}^k|)}

narrows as cuts accumulate. For a problem of this size (one reservoir, four stages, three openings) the gap typically narrows below 1% within tens of iterations and below 0.1% within a few hundred. These are illustrative scales, not guarantees: the actual iteration count depends on the demand-to-inflow ratio, the initial storage, and the noise level — all of which determine how often the reservoir reaches zero and how sensitive the value function is to small state changes.

The stopping rule fires when the gap falls below the configured tolerance or the iteration limit is reached — see Stopping Rules for the available criteria and their combinations. The upper-bound estimator used here is the simulation-based estimator; a deterministic inner-approximation bound is also available — see Upper Bound Evaluation.


This case isolates the core SDDP loop in its simplest form. It cannot illustrate:

  • Multi-reservoir effects: the cut becomes a multivariate hyperplane with one storage coefficient per reservoir; the optimiser must balance releases across plants. See Toy Four-Reservoir Walkthrough.
  • Spatial inflow correlation: when multiple plants share a wet/dry signal, their innovations ε\varepsilon must be drawn from a correlated multivariate normal via the spectral factorisation described in PAR Inflow Model section 8.
  • Autoregressive inflow memory: a PAR(p) model with p1p \geq 1 adds one lag state variable per lag and one cut coefficient per lag; the cut becomes a hyperplane in storage and lag state space. See PAR Inflow Model.
  • FPHA production model: nonlinear head-dependent efficiency approximated by piecewise-linear hyperplanes; one of the planes can bind, contributing to the storage cut coefficient. See Hydro Production Models.
  • Risk-measure effects: CVaR weighting that shifts cut aggregation probabilities away from uniform p=1/Np = 1/N. See Risk Measures.

The Toy Four-Reservoir Walkthrough extends this case to four independent reservoirs at four buses with transmission coupling, illustrating how the cut hyperplane and the per-bus dispatch mechanics scale up.


  • SDDP Algorithm — Forward and backward pass structure, lower-bound computation, convergence monitoring
  • LP Formulation — Complete stage LP: load balance, water balance, column-bound state pinning, objective taxonomy
  • Cut Management — Dual extraction, per-opening intercepts, single-cut aggregation, cut validity, sign convention
  • PAR Inflow Model — Inflow model definition; the p=0p = 0 degenerate case (white noise) used in this walkthrough; stored vs computed quantities
  • Stopping Rules — Iteration limit, gap threshold, bound-stalling criteria
  • Upper Bound Evaluation — Simulation-based and inner-approximation upper bounds
  • Toy Four-Reservoir Walkthrough — Multi-reservoir extension at 4 buses with transmission and independent 0-order inflows