Upper Bound Evaluation

Status: Not Implemented. This spec describes a planned design that has not yet been implemented.

Purpose

This spec defines the upper bound evaluation mechanism in Cobre via inner approximation (SIDP): the vertex-based value function approximation, Lipschitz interpolation, the linearized upper bound LP, gap computation, and configuration. It complements the outer approximation (cuts) described in SDDP Algorithm by providing a convergence certificate through deterministic upper bounds.

For notation conventions (index sets, parameters, decision variables, dual variables), see Notation Conventions.

Symbol convention: This spec uses $d$ for the discount factor. See Discount Rate.

1 Motivation

Standard SDDP provides only a lower bound (outer approximation) through cuts. For convergence verification, we need an upper bound (inner approximation). This is especially important for:

Risk-averse problems: CVaR objectives cannot be reliably estimated via Monte Carlo simulation of the policy
Convergence certificates: Gap $= \overset{z}{ˉ} - \underline{z}$ provides a true optimality measure
Conservative policies: Inner approximation gives “at most $Y$ ” guarantees

Deterministic vs. simulation-based upper bounds: The simulation-based stopping rules (see Stopping Rules) estimate an upper bound by running the SDDP policy on sampled scenarios and averaging costs. This is a statistical estimate — valid for risk-neutral problems but not for risk-averse (CVaR) objectives. The inner approximation described here provides a deterministic upper bound that is valid regardless of the risk measure.

2 Vertex-Based Inner Approximation

The inner approximation $\overset{ˉ}{V}_{t} (x)$ is constructed from vertices (visited state-value pairs):

$V_{t} = {(x^{(1)}, \overset{v}{ˉ}^{(1)}), (x^{(2)}, \overset{v}{ˉ}^{(2)}), \dots, (x^{(n)}, \overset{v}{ˉ}^{(n)})}$

where each vertex stores:

$x^{(i)}$ : State vector visited during forward passes
$\overset{v}{ˉ}^{(i)}$ : Upper bound on expected cost-to-go from that state (computed recursively)

3 Lipschitz Interpolation

For a new state $x$ not in $V_{t}$ , the upper bound is computed via Lipschitz interpolation:

$\overset{ˉ}{V}_{t} (x) = (x^{(i)}, \overset{v}{ˉ}^{(i)}) \in V_{t} min {\overset{v}{ˉ}^{(i)} + L_{t} \cdot ∥ x - x^{(i)} ∥_{1}}$

where $L_{t}$ is the Lipschitz constant for stage $t$ .

Interpretation: The upper bound at $x$ is the minimum over all vertices of “vertex value plus distance penalty.” This forms a concave piecewise-linear function — the inner (concave) counterpart to the outer (convex) cut approximation.

4 Lipschitz Constant Computation

The Lipschitz constant bounds the maximum rate of change of the value function with respect to the state. For SDDP with penalty-based feasibility (relatively complete recourse):

Backward accumulation:

$L_{T} = c_{ma x}^{p e na lt y}$

$L_{t} = d_{t \to t + 1} \cdot L_{t + 1} + c_{ma x}^{p e na lt y, t}$

where:

$c_{ma x}^{p e na lt y, t}$ is the maximum penalty coefficient at stage $t$ (e.g., deficit penalty in $/MWh)
$d_{t \to t + 1}$ is the discount factor for transition $t \to t + 1$ (see Discount Rate)

Note: The discount factor $d$ appears in the Lipschitz accumulation because the future cost is discounted. Without discounting ( $d = 1$ ), $L_{t}$ grows linearly with the remaining horizon.

Example: With deficit penalty $1000$ $/MWh over 5 stages, no discounting:

Stage $t$	Lipschitz $L_{t}$
5	1,000
4	2,000
3	3,000
2	4,000
1	5,000

5 Vertex Value Computation

During the upper bound evaluation pass (a backward pass variant):

At terminal stage $T$ :

$\overset{v}{ˉ}^{(i)} = E_{ω_{T}} [c_{T} (x^{(i)}, ω_{T})] (expected immediate cost only)$

At stage $t < T$ :

For each vertex $(x^{(i)}, \cdot) \in V_{t}$ :

For each scenario $ω_{t}$ , solve the stage subproblem with incoming state $x^{(i)}$ and realization $ω_{t}$
Obtain the optimal next-stage state $x_{t + 1}^{*} (ω_{t})$
Evaluate the inner approximation at the next stage: $\overset{ˉ}{θ} (ω_{t}) = \overset{ˉ}{V}_{t + 1} (x_{t + 1}^{*} (ω_{t}))$
Set vertex value as the expected discounted cost-to-go:

$\overset{v}{ˉ}^{(i)} = E_{ω_{t}} [c_{t} (x^{(i)}, ω_{t}) + d_{t \to t + 1} \cdot \overset{ˉ}{θ} (ω_{t})]$

Expectation: The vertex value is an expectation over scenarios, not a single-scenario value. This parallels the backward pass for cuts, which also computes expected cost-to-go.

6 Upper Bound Evaluation LP

For policy evaluation with inner approximation, the stage LP replaces the outer approximation (cut constraints on $θ$ ) with inner approximation (vertex constraints on $\overset{ˉ}{θ}$ ).

Standard LP (outer approximation — lower bound):

$min c_{t}^{⊤} x_{t} + d_{t \to t + 1} \cdot θ$

$s.t. θ \geq α_{k} + π_{k}^{⊤} x_{t} \forall k (cuts)$

Inner approximation LP (upper bound):

$min c_{t}^{⊤} x_{t} + d_{t \to t + 1} \cdot \overset{ˉ}{θ}$

$s.t. \overset{ˉ}{θ} \leq \overset{v}{ˉ}^{(i)} + L_{t} j \sum ∣ x_{t, j} - x_{j}^{(i)} ∣ \forall i \in V_{t} (vertices)$

Direction: Cut constraints are lower bounds on $θ$ ( $θ \geq \dots$ ). Vertex constraints are upper bounds on $\overset{ˉ}{θ}$ ( $\overset{ˉ}{θ} \leq \dots$ ). The cut approximation is convex (piecewise-linear from below); the vertex approximation is concave (piecewise-linear from above).

7 Linearized Upper Bound LP

The absolute value $∣ x_{j} - x_{j}^{(i)} ∣$ in the vertex constraints is linearized using standard splitting:

$∣ x_{j} - x_{j}^{(i)} ∣ = u_{j}^{(i) +} + u_{j}^{(i) -}$

$x_{j} - x_{j}^{(i)} = u_{j}^{(i) +} - u_{j}^{(i) -}$

$u_{j}^{(i) +}, u_{j}^{(i) -} \geq 0$

Additional variables (per vertex $i$ , per state component $j$ ):

Variable	Domain	Description
$u_{j}^{(i) +}$	$\geq 0$	Positive deviation from vertex $i$ in dimension $j$
$u_{j}^{(i) -}$	$\geq 0$	Negative deviation from vertex $i$ in dimension $j$
$\overset{ˉ}{θ}$	free	Upper bound on future cost

Constraints (for each vertex $i \in V_{t}$ ):

$\overset{ˉ}{θ} \leq \overset{v}{ˉ}^{(i)} + L_{t} j \sum (u_{j}^{(i) +} + u_{j}^{(i) -})$

$x_{j} - x_{j}^{(i)} = u_{j}^{(i) +} - u_{j}^{(i) -} \forall j$

8 Gap Computation

At each iteration $k$ where the upper bound is evaluated:

Lower bound (from cuts at stage 1):

$\underline{z}^{k} = c_{1} (\overset{x}{^}_{1}) + d_{1 \to 2} \cdot \underline{V}_{2} (\overset{x}{^}_{1})$

Upper bound (from vertices at stage 1):

$\overset{z}{ˉ}^{k} = c_{1} (\overset{x}{^}_{1}) + d_{1 \to 2} \cdot \overset{ˉ}{V}_{2} (\overset{x}{^}_{1})$

Relative gap:

$gap^{k} = \frac{z ˉ ^{k} - z ^{k}}{max ( 1 , ∣ z ˉ ^{k} ∣ )} \times 100%$

Convergence: As $k \to \infty$ , $gap^{k} \to 0$ for convex problems with finitely many scenarios.

For stopping rules that use the gap, see Stopping Rules.

9 Computational Considerations

Aspect	Impact
Vertices per stage	Typically $O (iterations \times forward_passes)$
LP size increase	$2 \times n_{s t a t e} \times n_{v er t i ces}$ additional variables
Evaluation frequency	Trade-off between gap accuracy and runtime
Memory	Vertices stored separately from cuts

Recommendation: Enable upper bound evaluation every 5-10 iterations after an initial burn-in period (10+ iterations) for convergence monitoring without excessive overhead.

10 Infinite Horizon

For cyclic policy graphs (see Infinite Horizon), the inner approximation operates on the same seasonal cut-pool structure: vertices are organized by season $τ$ , not by absolute stage ID. The Lipschitz constant must account for the cumulative discount around the cycle, which bounds the geometric series of future contributions.

The convergence guarantee still holds: with $d_{cyc l e} < 1$ , both the outer (cut) and inner (vertex) approximations converge to the true value function at the fixed point.

11 References

Costa, B.F.P., & Leclere, V. (2023). “Duality of upper bounds in stochastic dynamic programming.” Optimization Online. https://optimization-online.org/?p=23738

Philpott, A.B., de Matos, V.L., & Finardi, E.C. (2013). “On solving multistage stochastic programs with coherent risk measures.” Operations Research, 61(4), 957-970. https://doi.org/10.1287/opre.2013.1175

Cross-References

SDDP Algorithm — Core algorithm providing the outer approximation (lower bound) that this spec complements
Notation Conventions — Standard symbols for state variables, value functions, and cost-to-go
Discount Rate — Discount factor $d$ used in vertex value computation (§5) and Lipschitz accumulation (§4)
Infinite Horizon — Seasonal vertex organization for cyclic policy graphs
Cut Management — Outer approximation cuts that provide the lower bound counterpart
Stopping Rules — Convergence criteria that use the gap between inner and outer approximations
Risk Measures — CVaR objectives where deterministic upper bounds are essential
Binary Formats §3.3 — FlatBuffers Vertex and StageVertices schemas for persistence
Input Directory Structure §2.2 — upper_bound_evaluation configuration in config.json
Configuration Reference — Full configuration schema with defaults

Keyboard shortcuts

Cobre Methodology Reference