Skip to content

Stopping Rules

This spec defines the available stopping rules for the Cobre SDDP solver, their configuration, and how they combine. It covers iteration limits, time limits, bound stalling, and the recommended simulation-based stopping criterion.

SDDP can terminate based on multiple criteria. Each rule is evaluated independently, and the stopping_mode determines how they combine:

  • "any": Stop when any rule triggers (OR logic)

  • "all": Stop when all rules trigger (AND logic)

Configuration:

{ "type": "iteration_limit", "limit": 50 }

Evaluation:

STOP    kkmax\text{STOP} \iff k \geq k_{max}

where kk is the current iteration and kmaxk_{max} is the limit.

Purpose: Safety bound to prevent infinite loops. Must always be included.

Configuration:

{ "type": "time_limit", "seconds": 3600 }

Evaluation:

STOP    telapsedtmax\text{STOP} \iff t_{elapsed} \geq t_{max}

Wall-clock time is checked at the end of each iteration.

Configuration:

{
"type": "bound_stalling",
"iterations": 10,
"tolerance": 0.0001
}

Evaluation:

Track the deterministic lower bound zk\underline{z}^k over iterations. Compute relative improvement over a window of τ\tau iterations (the iterations parameter):

Δk=zkzkτmax(1,zk)\Delta_k = \frac{\underline{z}^k - \underline{z}^{k-\tau}}{\max(1, |\underline{z}^k|)}

Stopping condition:

STOP    Δk<tolerance\text{STOP} \iff |\Delta_k| < \text{tolerance}

Interpretation: The bound has plateaued — the relative improvement over the last τ\tau iterations is below the specified tolerance, indicating diminishing returns from further iterations.

Bound evolution across iterations kk: the lower bound zk\underline{z}^k rises monotonically (the append-only cut pool), while the Monte-Carlo upper bound zˉk\bar{z}^k descends with a 95% confidence band that tightens as sampling accumulates. The simulation-based rule fires once the gap (zˉkzk)/max(1,zˉk)(\bar{z}^k - \underline{z}^k)/\max(1,|\bar{z}^k|) closes below tolerance and the policy stabilises.

{
"type": "simulation",
"replications": 100,
"period": 20,
"bound_window": 5,
"distance_tol": 0.01,
"bound_tol": 0.0001
}
ParameterDescription
replicationsNumber of Monte Carlo forward simulations to run
periodCheck every this many iterations
bound_windowNumber of past iterations over which to measure bound stability
distance_tolThreshold for normalized distance between consecutive simulation results
bound_tolRelative tolerance for bound stability check

Algorithm:

  1. Check bound stability first:

    Bound stable    zkzkw<bound_tol×max(1,zk)\text{Bound stable} \iff \left| \underline{z}^k - \underline{z}^{k - w} \right| < \text{bound\_tol} \times \max(1, |\underline{z}^k|)

    where ww is the bound_window parameter.

  2. If bound is stable, run replications Monte Carlo simulations using the current policy. Compute per-stage total costs ctnewc_t^{new} and compare to the previous simulation’s costs ctoldc_t^{old}:

    d=t(ctnewctoldmax(1,ctold))2d = \sqrt{\sum_{t} \left( \frac{c_t^{new} - c_t^{old}}{\max(1, |c_t^{old}|)} \right)^2}

    The comparison metric is the mean per-stage cost across replications. Future extensions may compare other quantities (e.g., state variable trajectories or decision variable distributions).

  3. Stopping condition:

    STOP    Bound stabled<distance_tol\text{STOP} \iff \text{Bound stable} \land d < \text{distance\_tol}

Interpretation: Both the outer approximation (bound) and the policy (simulated costs) have stabilized.

Why recommended: Combines a theoretical convergence indicator (bound) with practical policy quality (simulation), avoiding premature termination from statistical noise.

An external signal interrupts the training loop at the next iteration boundary, terminating cleanly with the latest completed iteration’s policy persisted.

Guarantee: The policy at the moment of termination is usable. The last completed iteration’s cuts and bounds are recorded; partial-iteration work begun after the signal arrives is discarded. The graceful-shutdown guarantee is a special case of the broader provenance commitment described in Reproducibility and Provenance: the output artefacts are always in a consistent state, whether the run reached a configured stopping rule or was interrupted.

Unconditional: Graceful shutdown is not a configurable rule; it is an unconditional safety property of the training loop. It is not listed in stopping_rules in the case configuration and is not subject to stopping_mode combination logic.

Trade-off: Graceful shutdown costs at most one partial-iteration’s runtime — the work performed after the signal arrives is discarded. The alternative (immediate termination) would leave the policy state inconsistent.

Mode: "any" (default):

STOP    Rule1Rule2\text{STOP} \iff \text{Rule}_1 \lor \text{Rule}_2 \lor \ldots

First rule to trigger causes termination.

Mode: "all":

STOP    Rule1Rule2\text{STOP} \iff \text{Rule}_1 \land \text{Rule}_2 \land \ldots

All rules must trigger simultaneously.

Example (conservative setup):

{
"stopping_rules": [
{ "type": "iteration_limit", "limit": 500 },
{
"type": "simulation",
"replications": 100,
"period": 20,
"bound_window": 5,
"distance_tol": 0.01,
"bound_tol": 0.0001
}
],
"stopping_mode": "any"
}

This runs until simulation-based convergence OR 500 iterations, whichever comes first.

Graceful shutdown is independent of the stopping_mode combination logic — it terminates the training loop regardless of whether any or all of the configured rules have triggered.

When any stopping rule triggers, the output includes:

FieldDescription
stopping_ruleWhich rule triggered
final_iterationIteration count at termination
lower_boundFinal deterministic lower bound
upper_boundFinal simulated upper bound (if available)
gapOptimality gap: (zˉz)/max(1,zˉ)(\bar{z} - \underline{z}) / \max(1, \lvert\bar{z}\rvert)

The stopping_rule field carries the rule that triggered, including "graceful_shutdown" when an external signal terminated the run.