Production Scale Reference
Purpose
This spec defines the production-scale dimensions of the Cobre SDDP solver: system sizes, LP variable and constraint counts, state dimension formulas, and performance expectations by scale. It serves as the reference for capacity planning, memory budgeting, and performance regression detection.
1. Production Scale Dimensions
Based on the target production scenario:
| Dimension | Value | Memory Impact |
|---|---|---|
| Stages | 60 | Graph size |
| Blocks per Stage | 1-24 (varies), typically 3 | LP structure, outputs |
| Hydros | 160 | State dimension |
| Max AR Order | 12 | State dimension, variables/constraints |
| Thermals | 130 | Variables |
| Buses | 6 | Variables/constraints |
| Lines | 10 | Variables/constraints |
| Forward Passes | 192 | Parallelism |
| Openings | 10 | Backward pass LP solves |
| Iterations | 50 | Cut pool size |
| Simulation Scenarios | 2000 | Output size |
Decision DEC-009 (active): 60 stages is the production-scale reference baseline for all capacity planning, memory budgets, and performance targets across the corpus. Some specs reference 120 stages as a theoretical maximum or worst-case bound (e.g., FLOP cost calculations, temporal flattening memory estimates, hypothetical sizing). Such references are explicitly labeled as “worst-case” or “hypothetical maximum” to distinguish from the production baseline.
2. State Dimension Estimates
2.1 State Variables and Dimension
The state dimension determines the size of Benders cuts. State variables include:
| Component | Count | Description | Status |
|---|---|---|---|
| Storage | End-of-stage reservoir volume for each hydro | Implemented | |
| AR Lags | Inflow lag values for AR() models | Implemented | |
| Battery SOC | Battery state of charge | Deferred (C.2) | |
| GNL Committed | GNL dispatch pipeline | Deferred (C.1) |
State Dimension Formula:
For the current implementation (batteries and GNL deferred, both zero):
For production scale (160 hydros, AR order up to 12):
- Storage: 160
- AR lags: (all hydros at max order)
- Total: 2,080 (production assumption)
Note: The production assumption is , corresponding to all hydros using AR(12). This is the worst-case ceiling and the value used for all capacity planning and performance budgeting in this spec. If most hydros use AR(6), the dimension reduces to , which is the optimistic case.
3. Variable and Constraint Counts
3.1 Variable Count per Subproblem
| Component | Formula | Typical Count |
|---|---|---|
| Future cost | 1 | |
| Deficit | 6 × 3 × 1 = 18 | |
| Excess | 6 × 3 = 18 | |
| Exchange (direct + reverse) | 2 × 10 × 3 = 60 | |
| Hydro storage | 160 | |
| Hydro incremental inflow AR | 160 × 12 = 1,920 | |
| Hydro turbined flow | 160 × 3 = 480 | |
| Hydro spillage | 160 × 3 = 480 | |
| Hydro generation | 160 × 3 = 480 | |
| Hydro inflow (per-block) | 160 × 3 = 480 | |
| Hydro diversion | ~10 × 3 = 30 | |
| Hydro evaporation | 160 × 3 = 480 | |
| Hydro withdrawal | 160 × 3 = 480 | |
| Hydro slacks | 160 × 3 × 6 = 2,880 | |
| Thermal generation | 130 × 3 × 1 = 390 | |
| Contracts | 5 × 3 = 15 | |
| Pumping (flow + power) | 5 × 3 × 2 = 30 | |
| Total Variables | ~8,400 |
3.2 Constraint Count per Subproblem
| Component | Formula | Typical Count |
|---|---|---|
| Load balance | 6 × 3 = 18 | |
| Hydro water balance | 160 | |
| Incremental inflow AR dynamics | 160 | |
| Lagged incremental inflow fixing | 160 × 12 = 1,920 | |
| Hydro generation (constant) | (160 − 50) × 3 = 330 | |
| Hydro generation (FPHA) | 50 × 3 × 125 = 18,750 | |
| Outflow definition | 160 × 3 = 480 | |
| Outflow bounds (min/max) | 2 × 160 × 3 = 960 | |
| Turbined min | 160 × 3 = 480 | |
| Generation min | 160 × 3 = 480 | |
| Evaporation | 160 × 3 = 480 | |
| Water withdrawal | 160 × 3 = 480 | |
| Generic constraints | ~50 | |
| Active constraints | ~24,750 | |
| Benders cuts (pre-allocated) | 10,000-15,000 | |
| Total Constraints | ~35,000-40,000 |
Note: Among active constraints, FPHA hyperplane constraints dominate (~18,750 of ~24,750). The total constraint count including pre-allocated Benders cut slots reaches ~35,000-40,000, but during early iterations most cut constraints are inactive (bounds set to ). See Solver Abstraction §5 for pre-allocation design.
3.3 Counting Formulas (Exact)
For precise sizing, use the following formulas where parameters come from the system configuration. The production-scale values in §3.1 and §3.2 are obtained by substituting the parameters from §1.
Variables:
N_VAR = 1 # theta
+ N_BUS × N_BLOCK × (AVG_DEF_SEGMENTS + 1) # deficit + excess
+ 2 × N_LINE × N_BLOCK # exchange
+ N_HYDRO # storage
+ N_HYDRO × AR_ORDER # incremental inflow model
+ N_HYDRO × N_BLOCK × 4 # q, s, g, inflow
+ N_HYDRO_DIV × N_BLOCK # diversion
+ N_HYDRO_EVAP × N_BLOCK # evaporation
+ N_HYDRO_WITHDRAWAL × N_BLOCK # withdrawal
+ N_HYDRO × N_BLOCK × N_SLACK_TYPES # slack vars
+ N_THERMAL × N_BLOCK × AVG_COST_SEGMENTS # thermal
+ (N_CONTRACT_IMP + N_CONTRACT_EXP) × N_BLOCK # contracts
+ N_PUMP × N_BLOCK × 2 # pump flow + power
Constraints:
N_CON = N_BUS × N_BLOCK # load balance
+ N_HYDRO # water balance
+ N_HYDRO # inflow AR dynamics
+ N_HYDRO × AR_ORDER # lagged inflow fixing
+ (N_HYDRO - N_HYDRO_FPHA) × N_BLOCK # generation (constant)
+ N_HYDRO_FPHA × N_BLOCK × AVG_FPHA_PLANES # FPHA (additional)
+ N_HYDRO × N_BLOCK # outflow definition
+ N_HYDRO × N_BLOCK × 2 # outflow bounds
+ N_HYDRO × N_BLOCK # turbined min
+ N_HYDRO × N_BLOCK # generation min
+ N_HYDRO_EVAP × N_BLOCK # evaporation
+ N_HYDRO_WITHDRAWAL × N_BLOCK # withdrawal
+ N_GENERIC # generic constraints
+ N_CUT_CAPACITY # Benders cuts
State Dimension:
N_STATE = N_HYDRO # storage
+ SUM(AR_ORDER[h] for h in HYDROS) # AR lags
# + N_BATTERY (deferred C.2)
# + SUM(GNL_LAG[t] for t in GNL_THERMALS) (deferred C.1)
Production-scale parameter values: AVG_DEF_SEGMENTS = 1, AVG_COST_SEGMENTS = 1, AVG_FPHA_PLANES = 125, N_HYDRO_EVAP = N_HYDRO = 160, N_HYDRO_WITHDRAWAL = N_HYDRO = 160, AR_ORDER = 12 (worst case). All other parameters as in §1.
4. Performance Expectations by Scale
Note: The timing targets in §4.2 are engineering targets for each test system scale, serving as upper bounds for performance regression testing. Section §4.6 provides a separate first-principles wall-clock model for the Production configuration using LP solve time KPIs (§4.3) and parallelism parameters. The complete derivation is in the timing model analysis audit artifact. All estimates assume fully warm-started steady-state iterations and do not account for I/O, checkpointing, or convergence check overhead.
Purpose: This table provides expected timing targets for different problem scales, enabling performance validation and regression detection. Timings are per-iteration unless otherwise noted.
4.1 Hardware Assumptions
| Component | Specification |
|---|---|
| CPU | AMD EPYC 9R14 or equivalent (192 cores, 3.7 GHz base) |
| Memory | DDR5, 384 GB/node |
| Network | InfiniBand HDR (200 Gb/s) or equivalent |
| Storage | NVMe SSD for I/O operations |
4.2 Test Systems
System Definition
| Scale | Stages | Hydros | Thermals | AR Order | Hydro Production | Fwd Passes | Openings |
|---|---|---|---|---|---|---|---|
| Unit Test | 4 | 1 | 2 | 0 | Constant | 2 | 10 |
| Small | 12 | 2 | 4 | 0 | Constant | 4 | 10 |
| Medium | 24 | 4 | 20 | 0 | Constant | 12 | 20 |
| Large | 24 | 4 | 20 | 6 | Constant | 16 | 20 |
| XLarge | 36 | 12 | 130 | 12 | Constant | 32 | 20 |
| 2XLarge | 36 | 12 | 130 | 12 | FPHA (125) | 32 | 20 |
| Production | 60 | 160 | 130 | 12 | FPHA (125) | 192 | 10 |
Hardware Configuration and Performance Targets
| Scale | Ranks | Threads/Rank | Forward Time | Backward Time | Memory/Rank |
|---|---|---|---|---|---|
| Unit Test | 1 | 2 | <0.2s | <1s | — |
| Small | 2 | 2 | <0.3s | <1.5s | — |
| Medium | 2 | 6 | <1s | <10s | — |
| Large | 2 | 8 | <1.2s | <12s | — |
| XLarge | 2 | 8 | <2s | <20s | — |
| 2XLarge | 2 | 8 | <3s | <30s | — |
| Production | 4 | 48 | <10s | <90s | — |
Note: All timing values are engineering targets based on domain experience. The Production row model-derived estimates from §4.6 are substantially lower than the engineering targets, indicating headroom for I/O, cold-starts, and solver variability. Memory/Rank values are to be calculated using the sizing tool; approximate sizing can be derived from the formulas in §3 and Memory Architecture §2.1.
4.3 Key Performance Indicators
| Metric | Target | Measurement Point |
|---|---|---|
| LP solve (warm-start) | ≤25 ms | Hot-path, ~8,400-variable subproblem |
| LP solve (cold-start) | ≤250 ms | First solve or basis invalid |
| RHS batch update | <100 us | ~25,000 constraint updates |
| Solution extraction | <50 us | Primal + basis to buffers |
| Cut exchange (MPI_Allgatherv) | <5 ms | Per-stage, all ranks exchange cuts |
| Parallel efficiency | >80% | At 4 ranks vs. 1 rank |
| Warm-start hit rate | >70% | Forward pass consecutive stages |
4.4 Scaling Expectations
| Dimension | Scaling Behavior |
|---|---|
| Forward pass | Time (stages LP solve time) / (ranks threads). Near-linear speedup. |
| Backward pass | Time (stages ) openings LP solve time. Sequential stage and opening dependency (warm-start). Each rank redundantly solves all openings for cut locality. |
| Communication | With 4 ranks on InfiniBand, pure data transfer and synchronization are negligible (<0.1% of iteration time). See §4.6 time budget. |
| Memory per rank | solver workspaces (~36 MB threads) + StageLpCache via SharedRegion (~22.3 GB node-wide). See Memory Architecture §2.1 |
| Cut pool growth | Logical growth only (pre-allocated slots). Memory stable after initialization. |
Thread utilization at 4 ranks: The forward and backward passes have different utilization characteristics. Forward pass: With trajectories distributed across ranks, each rank receives trajectories. With threads per rank, each thread handles exactly 1 trajectory — 100% thread utilization. Backward pass: The trial states (gathered from the forward pass) are distributed across 4 ranks (48 per rank) and then across 48 threads (1 trial state per thread). Each thread evaluates all openings sequentially for its trial state, maintaining warm-start benefit. Per stage: LP solves per rank, 10 sequential solves per thread — 100% thread utilization, with sequential depth . All 4 ranks are active (each rank independently generates cuts from its assigned trial states). The per-iteration compute time is determined by the sequential depth per stage: for the backward pass. See the timing model analysis §8 for the complete utilization comparison.
4.5 Convergence Reference
| Problem Type | Typical Iterations | Optimality Gap |
|---|---|---|
| Simple (few hydros, short horizon) | 10-20 | <0.1% |
| Medium (regional system) | 30-50 | <0.5% |
| Complex (full national grid) | 50-100 | <1.0% |
| With CVaR risk measure | +20-50% iterations | Same gap target |
Note: Iteration counts assume reasonable initial policy (warm-start from previous study). Cold-start may require 2-3x more iterations. The Production system (60 stages, 160 hydros, AR(12)) falls in the “Complex” category.
4.6 Wall-Clock Time Budget
The following per-iteration time budget is derived from a first-principles model at ms, ranks, threads/rank, (worst-case AR(12)), InfiniBand HDR. The complete derivation with all intermediate steps is in the timing model analysis.
| Component | Per-Iteration | Fraction | Category |
|---|---|---|---|
| Forward pass compute | 1.500 s | 9.07% | Compute |
| Backward pass compute | 14.750 s | 89.15% | Compute |
Trial point Allgatherv | <0.001 s | <0.01% | Communication |
| Cut exchange (59 stages) | <0.001 s | <0.01% | Communication |
Convergence Allreduce | <0.001 s | <0.01% | Communication |
| Barrier overhead (59 stages) | 0.295 s | 1.78% | Synchronization |
| Per-iteration total | 16.545 s | 100% |
Forward pass: ms. Each rank receives trajectories, fully utilizing 48 threads.
Backward pass: ms. Each rank sequentially solves all 10 openings per stage for warm-start benefit.
50-iteration projection:
| Metric | Value | Notes |
|---|---|---|
| 50-iteration total | 827.3 s | |
| Wall-clock budget | 7,200.0 s | 2-hour operational requirement |
| Budget fraction | 11.5% | Headroom: 6,373 s (88.5%) |
| Headroom available for | I/O, checkpointing, cold-starts, cut management |
Sensitivity: Critical LP Solve Time
The model is linear in . LP solves per iteration: . Including barrier overhead (2% of backward compute), the LP-equivalent coefficient is . The total 50-iteration compute time is:
Setting s and solving for the critical LP solve time:
If the effective LP solve time exceeds approximately 218 ms, the 50-iteration budget is violated. With 60 stages, 10 openings, and 4 ranks, the solver has substantial headroom — the critical threshold is nearly 9x the baseline assumption.
| Scenario | 50-iter total | Budget % | Verdict | |
|---|---|---|---|---|
| Optimistic warm-start | 10 ms | ~331 s | 4.6% | Well under budget |
| Baseline (spec KPI) | 25 ms | ~827 s | 11.5% | Under budget |
| Moderate degradation | 50 ms | ~1,655 s | 23.0% | Under budget |
| Elevated solve time | 100 ms | ~3,309 s | 46.0% | Under budget |
| Heavy degradation | 150 ms | ~4,964 s | 68.9% | Under budget |
| Critical threshold | ~218 ms | ~7,200 s | 100% | At budget limit |
Sensitivity: Backward Pass Warm-Start Rate
The backward pass constitutes ~89% of compute. If the backward pass warm-start hit rate drops below 100% (the design target from Work Distribution §2.3), LP solve times blend between warm-start ( ms) and cold-start ( ms):
| Effective | 50-iter total | Budget % | Verdict | |
|---|---|---|---|---|
| 100% | 25 ms | ~827 s | 11.5% | Under budget |
| 90% | 47.5 ms | ~1,504 s | 20.9% | Under budget |
| 80% | 70 ms | ~2,181 s | 30.3% | Under budget |
| 70% | 92.5 ms | ~2,858 s | 39.7% | Under budget |
With 10 openings and 60 stages, warm-start remains important for performance but is not an existential threat to the 2-hour budget. Even at 70% hit rate, the solver uses ~40% of the budget, leaving 60% headroom for other overhead. The sequential opening evaluation design (Work Distribution §2.3) still targets near-100% warm-start for optimal LP throughput.
4.7 Model Assumptions
Every timing model estimate in §4.6 depends on the assumptions below. The table distinguishes spec-defined KPIs (which are targets to be met by the implementation), spec-defined architecture parameters (which are design decisions), and engineering estimates (which need validation).
| Assumption | Value | Source | Category | Sensitivity |
|---|---|---|---|---|
| LP solve time (warm-start) | 25 ms | §4.3 KPI | Spec KPI | High |
| LP solve time (cold-start) | 250 ms | §4.3 KPI | Spec KPI | Medium |
| Forward pass warm-start hit rate | 70% | §4.3 KPI | Spec KPI | Low |
| Backward pass warm-start hit rate | ~100% | Work Distribution §2.3 (sequential openings) | Spec Architecture | Low |
| MPI ranks | 4 | §4.2 | Spec Architecture | Low |
| Threads per rank | 48 | §4.2 | Spec Architecture | Low |
| Stages | 60 | §1 | Spec Architecture | Low |
| Forward passes | 192 | §1 | Spec Architecture | Low |
| Openings | 10 | §1 | Spec Architecture | Low |
| Iterations | 50 | §1 | Spec Architecture | Low |
| Backward stages = | 59 | Training Loop §6.1 | Spec Architecture | Low |
| One LP solve per stage per trajectory | 1 | §3 (blocks within LP, not separate solves) | Spec Architecture | Low |
| Sequential opening evaluation per thread | Sequential | Work Distribution §2.3 | Spec Architecture | Low |
| InfiniBand HDR bandwidth | 25 GB/s | §4.1 | Spec Architecture | Low |
| IB protocol overhead factor | 50% | Communication Patterns §3.2 | Engineering Estimate | Low |
| MPI base latency (InfiniBand) | 2 us | Conservative estimate for modern IB HCA | Engineering Estimate | Low |
| LP solve time standard deviation | 15% | Work Distribution §4.1 (upper end of 5-15%) | Engineering Estimate | Low |
| Load imbalance barrier overhead | 2% | Derived from statistical model of per-stage max-of-ranks | Engineering Estimate | Low |
High-sensitivity parameters: LP solve time ( ms) is the primary assumption with material impact on the time budget, though the critical threshold (~218 ms) provides nearly 9x headroom above the baseline. With 10 openings, backward pass warm-start hit rate () is a moderate concern — at 70% hit rate, the budget fraction reaches ~40%, still well under the limit but with less headroom than the baseline. Early solver benchmarking should prioritize measuring LP solve time under warm-start conditions with production-scale subproblems (~8,400 variables, ~25,000 active constraints).
Cross-References
- Design Principles — Format selection criteria and design goals
- Notation Conventions — Mathematical symbols used in formulas above
- LP Formulation — Complete LP subproblem that these dimensions describe
- SDDP Algorithm — Forward/backward pass structure driving performance targets
- Solver Abstraction §5 — Cut pool pre-allocation and capacity management
- Solver Workspaces §1.2 — Per-thread workspace sizing (~36 MB HiGHS, ~21 MB CLP)
- Solver Abstraction SS11.4 — StageLpCache design and sizing (~22.3 GB SharedRegion)
- Memory Architecture §2 — Node memory budget (~27.7 GB), two-tier model
- Hybrid Parallelism — MPI+OpenMP architecture for achieving these scaling targets
- Communication Patterns §3 — Communication volume analysis, pure transfer < 0.1%
- Work Distribution §2 — Forward/backward pass distribution, thread-trajectory affinity
- Training Loop §6 — Backward pass stage loop, cut synchronization per stage
- SLURM Deployment — Job scripts for the test system scales above