Stopping Rules
Purpose
Section titled “Purpose”This spec defines the available stopping rules for the Cobre SDDP solver, their configuration, and how they combine. It covers iteration limits, time limits, bound stalling, and the recommended simulation-based stopping criterion.
1 Available Stopping Rules
Section titled “1 Available Stopping Rules”SDDP can terminate based on multiple criteria. Each rule is evaluated independently, and the stopping_mode determines how they combine:
-
"any": Stop when any rule triggers (OR logic) -
"all": Stop when all rules trigger (AND logic)
2 Iteration Limit (Mandatory)
Section titled “2 Iteration Limit (Mandatory)”Configuration:
{ "type": "iteration_limit", "limit": 50 }Evaluation:
where is the current iteration and is the limit.
Purpose: Safety bound to prevent infinite loops. Must always be included.
3 Time Limit
Section titled “3 Time Limit”Configuration:
{ "type": "time_limit", "seconds": 3600 }Evaluation:
Wall-clock time is checked at the end of each iteration.
4 Bound Stalling
Section titled “4 Bound Stalling”Configuration:
{ "type": "bound_stalling", "iterations": 10, "tolerance": 0.0001}Evaluation:
Track the deterministic lower bound over iterations. Compute relative improvement over a window of iterations (the iterations parameter):
Stopping condition:
Interpretation: The bound has plateaued — the relative improvement over the last iterations is below the specified tolerance, indicating diminishing returns from further iterations.
5 Simulation-Based Stopping (Recommended)
Section titled “5 Simulation-Based Stopping (Recommended)”Bound evolution across iterations : the lower bound rises monotonically (the append-only cut pool), while the Monte-Carlo upper bound descends with a 95% confidence band that tightens as sampling accumulates. The simulation-based rule fires once the gap closes below tolerance and the policy stabilises.
{ "type": "simulation", "replications": 100, "period": 20, "bound_window": 5, "distance_tol": 0.01, "bound_tol": 0.0001}| Parameter | Description |
|---|---|
replications | Number of Monte Carlo forward simulations to run |
period | Check every this many iterations |
bound_window | Number of past iterations over which to measure bound stability |
distance_tol | Threshold for normalized distance between consecutive simulation results |
bound_tol | Relative tolerance for bound stability check |
Algorithm:
-
Check bound stability first:
where is the
bound_windowparameter. -
If bound is stable, run
replicationsMonte Carlo simulations using the current policy. Compute per-stage total costs and compare to the previous simulation’s costs :The comparison metric is the mean per-stage cost across replications. Future extensions may compare other quantities (e.g., state variable trajectories or decision variable distributions).
-
Stopping condition:
Interpretation: Both the outer approximation (bound) and the policy (simulated costs) have stabilized.
Why recommended: Combines a theoretical convergence indicator (bound) with practical policy quality (simulation), avoiding premature termination from statistical noise.
6 Graceful Shutdown
Section titled “6 Graceful Shutdown”An external signal interrupts the training loop at the next iteration boundary, terminating cleanly with the latest completed iteration’s policy persisted.
Guarantee: The policy at the moment of termination is usable. The last completed iteration’s cuts and bounds are recorded; partial-iteration work begun after the signal arrives is discarded. The graceful-shutdown guarantee is a special case of the broader provenance commitment described in Reproducibility and Provenance: the output artefacts are always in a consistent state, whether the run reached a configured stopping rule or was interrupted.
Unconditional: Graceful shutdown is not a configurable rule; it is an unconditional
safety property of the training loop. It is not listed in stopping_rules in the case
configuration and is not subject to stopping_mode combination logic.
Trade-off: Graceful shutdown costs at most one partial-iteration’s runtime — the work performed after the signal arrives is discarded. The alternative (immediate termination) would leave the policy state inconsistent.
7 Combining Rules
Section titled “7 Combining Rules”Mode: "any" (default):
First rule to trigger causes termination.
Mode: "all":
All rules must trigger simultaneously.
Example (conservative setup):
{ "stopping_rules": [ { "type": "iteration_limit", "limit": 500 }, { "type": "simulation", "replications": 100, "period": 20, "bound_window": 5, "distance_tol": 0.01, "bound_tol": 0.0001 } ], "stopping_mode": "any"}This runs until simulation-based convergence OR 500 iterations, whichever comes first.
Graceful shutdown is independent of the stopping_mode combination logic — it terminates
the training loop regardless of whether any or all of the configured rules have triggered.
8 Output on Termination
Section titled “8 Output on Termination”When any stopping rule triggers, the output includes:
| Field | Description |
|---|---|
stopping_rule | Which rule triggered |
final_iteration | Iteration count at termination |
lower_bound | Final deterministic lower bound |
upper_bound | Final simulated upper bound (if available) |
gap | Optimality gap: |
The stopping_rule field carries the rule that triggered, including "graceful_shutdown"
when an external signal terminated the run.
Cross-References
Section titled “Cross-References”- Notation Conventions — Symbol definitions for bounds and statistical quantities
- SDDP Algorithm — Main iteration loop that evaluates stopping rules
- Cut Management — Cut generation and selection that affect convergence speed
- Upper Bound Evaluation — Monte Carlo simulation for upper bound estimation, used by simulation-based stopping
- Risk Measures — Risk-averse formulations that affect bound interpretation
- Reproducibility and Provenance — Provenance commitment that the graceful-shutdown guarantee is a special case of