Upper Bound Evaluation
Status: Not Implemented. This spec describes a planned design that has not yet been implemented.
Purpose
This spec defines the upper bound evaluation mechanism in Cobre via inner approximation (SIDP): the vertex-based value function approximation, Lipschitz interpolation, the linearized upper bound LP, gap computation, and configuration. It complements the outer approximation (cuts) described in SDDP Algorithm by providing a convergence certificate through deterministic upper bounds.
For notation conventions (index sets, parameters, decision variables, dual variables), see Notation Conventions.
Symbol convention: This spec uses for the discount factor. See Discount Rate.
1 Motivation
Standard SDDP provides only a lower bound (outer approximation) through cuts. For convergence verification, we need an upper bound (inner approximation). This is especially important for:
- Risk-averse problems: CVaR objectives cannot be reliably estimated via Monte Carlo simulation of the policy
- Convergence certificates: Gap provides a true optimality measure
- Conservative policies: Inner approximation gives “at most ” guarantees
Deterministic vs. simulation-based upper bounds: The simulation-based stopping rules (see Stopping Rules) estimate an upper bound by running the SDDP policy on sampled scenarios and averaging costs. This is a statistical estimate — valid for risk-neutral problems but not for risk-averse (CVaR) objectives. The inner approximation described here provides a deterministic upper bound that is valid regardless of the risk measure.
2 Vertex-Based Inner Approximation
The inner approximation is constructed from vertices (visited state-value pairs):
where each vertex stores:
- : State vector visited during forward passes
- : Upper bound on expected cost-to-go from that state (computed recursively)
3 Lipschitz Interpolation
For a new state not in , the upper bound is computed via Lipschitz interpolation:
where is the Lipschitz constant for stage .
Interpretation: The upper bound at is the minimum over all vertices of “vertex value plus distance penalty.” This forms a concave piecewise-linear function — the inner (concave) counterpart to the outer (convex) cut approximation.
4 Lipschitz Constant Computation
The Lipschitz constant bounds the maximum rate of change of the value function with respect to the state. For SDDP with penalty-based feasibility (relatively complete recourse):
Backward accumulation:
where:
- is the maximum penalty coefficient at stage (e.g., deficit penalty in $/MWh)
- is the discount factor for transition (see Discount Rate)
Note: The discount factor appears in the Lipschitz accumulation because the future cost is discounted. Without discounting (), grows linearly with the remaining horizon.
Example: With deficit penalty $/MWh over 5 stages, no discounting:
| Stage | Lipschitz |
|---|---|
| 5 | 1,000 |
| 4 | 2,000 |
| 3 | 3,000 |
| 2 | 4,000 |
| 1 | 5,000 |
5 Vertex Value Computation
During the upper bound evaluation pass (a backward pass variant):
At terminal stage :
At stage :
For each vertex :
- For each scenario , solve the stage subproblem with incoming state and realization
- Obtain the optimal next-stage state
- Evaluate the inner approximation at the next stage:
- Set vertex value as the expected discounted cost-to-go:
Expectation: The vertex value is an expectation over scenarios, not a single-scenario value. This parallels the backward pass for cuts, which also computes expected cost-to-go.
6 Upper Bound Evaluation LP
For policy evaluation with inner approximation, the stage LP replaces the outer approximation (cut constraints on ) with inner approximation (vertex constraints on ).
Standard LP (outer approximation — lower bound):
Inner approximation LP (upper bound):
Direction: Cut constraints are lower bounds on (). Vertex constraints are upper bounds on (). The cut approximation is convex (piecewise-linear from below); the vertex approximation is concave (piecewise-linear from above).
7 Linearized Upper Bound LP
The absolute value in the vertex constraints is linearized using standard splitting:
Additional variables (per vertex , per state component ):
| Variable | Domain | Description |
|---|---|---|
| Positive deviation from vertex in dimension | ||
| Negative deviation from vertex in dimension | ||
| free | Upper bound on future cost |
Constraints (for each vertex ):
8 Gap Computation
At each iteration where the upper bound is evaluated:
Lower bound (from cuts at stage 1):
Upper bound (from vertices at stage 1):
Relative gap:
Convergence: As , for convex problems with finitely many scenarios.
For stopping rules that use the gap, see Stopping Rules.
9 Computational Considerations
| Aspect | Impact |
|---|---|
| Vertices per stage | Typically |
| LP size increase | additional variables |
| Evaluation frequency | Trade-off between gap accuracy and runtime |
| Memory | Vertices stored separately from cuts |
Recommendation: Enable upper bound evaluation every 5-10 iterations after an initial burn-in period (10+ iterations) for convergence monitoring without excessive overhead.
10 Infinite Horizon
For cyclic policy graphs (see Infinite Horizon), the inner approximation operates on the same seasonal cut-pool structure: vertices are organized by season , not by absolute stage ID. The Lipschitz constant must account for the cumulative discount around the cycle, which bounds the geometric series of future contributions.
The convergence guarantee still holds: with , both the outer (cut) and inner (vertex) approximations converge to the true value function at the fixed point.
11 References
Costa, B.F.P., & Leclere, V. (2023). “Duality of upper bounds in stochastic dynamic programming.” Optimization Online. https://optimization-online.org/?p=23738
Philpott, A.B., de Matos, V.L., & Finardi, E.C. (2013). “On solving multistage stochastic programs with coherent risk measures.” Operations Research, 61(4), 957-970. https://doi.org/10.1287/opre.2013.1175
Cross-References
- SDDP Algorithm — Core algorithm providing the outer approximation (lower bound) that this spec complements
- Notation Conventions — Standard symbols for state variables, value functions, and cost-to-go
- Discount Rate — Discount factor used in vertex value computation (§5) and Lipschitz accumulation (§4)
- Infinite Horizon — Seasonal vertex organization for cyclic policy graphs
- Cut Management — Outer approximation cuts that provide the lower bound counterpart
- Stopping Rules — Convergence criteria that use the gap between inner and outer approximations
- Risk Measures — CVaR objectives where deterministic upper bounds are essential
- Binary Formats §3.3 — FlatBuffers
VertexandStageVerticesschemas for persistence - Input Directory Structure §2.2 —
upper_bound_evaluationconfiguration inconfig.json - Configuration Reference — Full configuration schema with defaults