Upper Bound Evaluation
Purpose
Section titled “Purpose”This spec defines the upper bound evaluation mechanism in Cobre via inner approximation (SIDP): the vertex-based value function approximation, Lipschitz interpolation, the linearized upper bound LP, gap computation, and configuration. It complements the outer approximation (cuts) described in SDDP Algorithm by providing a convergence certificate through deterministic upper bounds.
For notation conventions (index sets, parameters, decision variables, dual variables), see Notation Conventions.
1 Motivation
Section titled “1 Motivation”Standard SDDP provides only a lower bound (outer approximation) through cuts. Convergence verification requires an upper bound (inner approximation). This is especially important for:
- Risk-averse problems: CVaR objectives cannot be reliably estimated via Monte Carlo simulation of the policy
- Convergence certificates: Gap provides a true optimality measure
- Conservative policies: Inner approximation gives “at most ” guarantees
2 Vertex-Based Inner Approximation
Section titled “2 Vertex-Based Inner Approximation”The inner approximation is constructed from vertices (visited state-value pairs):
where each vertex stores:
- : State vector visited during forward passes
- : Upper bound on expected cost-to-go from that state (computed recursively)
3 Lipschitz Interpolation
Section titled “3 Lipschitz Interpolation”For a new state not in , the upper bound is computed via Lipschitz interpolation:
where is the Lipschitz constant for stage .
Interpretation: The upper bound at is the minimum over all vertices of “vertex value plus distance penalty.” This forms a concave piecewise-linear function — the inner (concave) counterpart to the outer (convex) cut approximation.
4 Lipschitz Constant Computation
Section titled “4 Lipschitz Constant Computation”The Lipschitz constant bounds the maximum rate of change of the value function with respect to the state. For SDDP with penalty-based feasibility (relatively complete recourse):
Backward accumulation:
where:
- is the maximum penalty coefficient at stage (e.g., deficit penalty in $/MWh)
- is the discount factor for transition (see Discount Rate)
Example: With deficit penalty $/MWh over 5 stages, no discounting:
| Stage | Lipschitz |
|---|---|
| 5 | 1,000 |
| 4 | 2,000 |
| 3 | 3,000 |
| 2 | 4,000 |
| 1 | 5,000 |
5 Vertex Value Computation
Section titled “5 Vertex Value Computation”During the upper bound evaluation pass (a backward pass variant):
At terminal stage :
At stage :
For each vertex :
- For each scenario , solve the stage subproblem with incoming state and realization
- Obtain the optimal next-stage state
- Evaluate the inner approximation at the next stage:
- Set vertex value as the expected discounted cost-to-go:
6 Upper Bound Evaluation LP
Section titled “6 Upper Bound Evaluation LP”For policy evaluation with inner approximation, the stage LP replaces the outer approximation (cut constraints on ) with inner approximation (vertex constraints on ).
Standard LP (outer approximation — lower bound):
Inner approximation LP (upper bound):
7 Linearized Upper Bound LP
Section titled “7 Linearized Upper Bound LP”The absolute value in the vertex constraints is linearized using standard splitting:
Additional variables (per vertex , per state component ):
| Variable | Domain | Description |
|---|---|---|
| Positive deviation from vertex in dimension | ||
| Negative deviation from vertex in dimension | ||
| free | Upper bound on future cost |
Constraints (for each vertex ):
8 Gap Computation
Section titled “8 Gap Computation”At each iteration where the upper bound is evaluated:
Lower bound (from cuts at stage 1):
Upper bound (from vertices at stage 1):
Relative gap:
Convergence: As , for convex problems with finitely many scenarios.
For stopping rules that use the gap, see Stopping Rules.
9 Computational Considerations
Section titled “9 Computational Considerations”| Aspect | Impact |
|---|---|
| Vertices per stage | Typically |
| LP size increase | additional variables |
| Evaluation frequency | Trade-off between gap accuracy and runtime |
| Memory | Vertices stored separately from cuts |
Recommendation: Enable upper bound evaluation every 5-10 iterations after an initial burn-in period (10+ iterations) for convergence monitoring without excessive overhead.
10 Out-of-Sample Simulation Procedure
Section titled “10 Out-of-Sample Simulation Procedure”Cobre can also estimate an upper bound on expected total cost by running the trained policy on scenarios drawn independently of the training tree. This procedure produces a statistical upper-bound estimate; for the deterministic upper bound applicable to risk-averse policies, see sections 2–9 of this chapter.
10.1 Independence from Training
Section titled “10.1 Independence from Training”The core methodological guarantee is that the noise used for the simulation forward pass is drawn independently of the noise used during training. Training forward passes sample from the opening tree (see Scenario Generation) to build the scenario tree that drives cut generation; any cost computed by re-running the policy on those same training scenarios would produce a biased estimator — the cuts were shaped to be tight at those states. The out-of-sample simulation avoids this by drawing a fresh set of independent noise realizations from the same opening tree via a separate seed. Because cuts have no dependence on these independent draws, the resulting cost sample is an unbiased estimator of the true expected cost under the current policy.
10.2 Monte Carlo Estimator
Section titled “10.2 Monte Carlo Estimator”The simulation executes a complete forward pass for each of the independent replications, recording the total discounted cost for scenario :
where is the immediate cost at stage of replication , and is the cumulative discount factor from stage 1 to stage (see Discount Rate).
The sample mean is the Monte Carlo estimator of expected total cost:
This estimator is unbiased under independent draws: . The sample standard deviation is:
In addition to mean and standard deviation, the simulation computes the Conditional Value-at-Risk at a specified confidence level from the sorted scenario costs. For the CVaR estimator and its interpretation under risk-averse policies, see Risk Measures.
10.3 Confidence Interval
Section titled “10.3 Confidence Interval”Under the normal approximation, the 95% confidence interval for has half-width:
The approximation is reliable when (central-limit-theorem regime). The reported interval is .
Trade-off: every doubling of narrows the confidence interval by a factor of , but costs proportionally more LP solves — stage subproblems per simulation check. The practical baseline of replications gives a half-width of roughly , which is sufficient to distinguish a converged policy from one still improving.
10.4 Configurable Replication Count
Section titled “10.4 Configurable Replication Count”The sole knob governing the out-of-sample procedure is the number of replications (the replications parameter of the simulation stopping rule). It controls the statistical resolution of the estimator and the compute cost of each stopping check simultaneously.
| Confidence interval half-width | LP solves per check () | |
|---|---|---|
| 20 | 2,400 | |
| 100 | 12,000 | |
| 500 | 60,000 |
The baseline is practical for production runs: at the reference scale of stages and 16 MPI ranks, the simulation check costs approximately 1.7 seconds of wall-clock time per trigger. Raising narrows the interval linearly in at a proportional cost in simulation time.
10.5 Interaction with the Simulation-Based Stopping Rule
Section titled “10.5 Interaction with the Simulation-Based Stopping Rule”The out-of-sample simulation described here is the per-iteration measurement on which the simulation-based stopping rule operates. Each time the stopping rule fires (every period iterations, when the outer-approximation bound is stable), one simulation check is executed, producing , , and for the current iteration. The stopping rule then compares consecutive simulation estimates to test whether the policy has stabilized across iterations. The stopping criterion and the comparison metric are fully specified in Stopping Rules section 5; this chapter owns only the per-iteration estimator (mean, standard deviation, confidence interval) and delegates the convergence decision to that chapter.
11 Cyclic Mode
Section titled “11 Cyclic Mode”For cyclic policy graphs (see Horizon Modes), the inner approximation operates on the same seasonal cut-pool structure: vertices are organized by season , not by absolute stage ID. The Lipschitz constant must account for the cumulative discount around the cycle, which bounds the geometric series of future contributions.
The convergence guarantee still holds: with , both the outer (cut) and inner (vertex) approximations converge to the true value function at the fixed point.
12 References
Section titled “12 References”Costa, B.F.P., & Leclere, V. (2023). “Duality of upper bounds in stochastic dynamic programming.” Optimization Online. https://optimization-online.org/?p=23738
Philpott, A.B., de Matos, V.L., & Finardi, E.C. (2013). “On solving multistage stochastic programs with coherent risk measures.” Operations Research, 61(4), 957-970. https://doi.org/10.1287/opre.2013.1175
Cross-References
Section titled “Cross-References”- SDDP Algorithm — Core algorithm providing the outer approximation (lower bound) that this spec complements
- Notation Conventions — Standard symbols for state variables, value functions, and cost-to-go
- Discount Rate — Discount factor used in vertex value computation (section 5) and Lipschitz accumulation (section 4)
- Horizon Modes — Cyclic policy graphs and the season-indexed pool structure that the inner approximation mirrors
- Cut Management — Outer approximation cuts that provide the lower bound counterpart
- Stopping Rules — Convergence criteria that use the gap between inner and outer approximations; simulation-based stopping rule that consumes the per-iteration out-of-sample estimator (section 10.5)
- Risk Measures — CVaR objectives where deterministic upper bounds are essential; CVaR estimator validity under risk-averse policies (section 10.1)
- Scenario Generation — Opening-tree definition from which out-of-sample replications draw independent noise realizations (section 10.1)