Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Production Scale Reference

Purpose

This spec defines the production-scale dimensions of the Cobre SDDP solver: system sizes, LP variable and constraint counts, state dimension formulas, and performance expectations by scale. It serves as the reference for capacity planning, memory budgeting, and performance regression detection.

1. Production Scale Dimensions

Based on the target production scenario:

DimensionValueMemory Impact
Stages60Graph size
Blocks per Stage1-24 (varies), typically 3LP structure, outputs
Hydros160State dimension
Max AR Order12State dimension, variables/constraints
Thermals130Variables
Buses6Variables/constraints
Lines10Variables/constraints
Forward Passes192Parallelism
Openings10Backward pass LP solves
Iterations50Cut pool size
Simulation Scenarios2000Output size

Decision DEC-009 (active): 60 stages is the production-scale reference baseline for all capacity planning, memory budgets, and performance targets across the corpus. Some specs reference 120 stages as a theoretical maximum or worst-case bound (e.g., FLOP cost calculations, temporal flattening memory estimates, hypothetical sizing). Such references are explicitly labeled as “worst-case” or “hypothetical maximum” to distinguish from the production baseline.

2. State Dimension Estimates

2.1 State Variables and Dimension

The state dimension determines the size of Benders cuts. State variables include:

ComponentCountDescriptionStatus
StorageEnd-of-stage reservoir volume for each hydroImplemented
AR LagsInflow lag values for AR() modelsImplemented
Battery SOCBattery state of chargeDeferred (C.2)
GNL CommittedGNL dispatch pipelineDeferred (C.1)

State Dimension Formula:

For the current implementation (batteries and GNL deferred, both zero):

For production scale (160 hydros, AR order up to 12):

  • Storage: 160
  • AR lags: (all hydros at max order)
  • Total: 2,080 (production assumption)

Note: The production assumption is , corresponding to all hydros using AR(12). This is the worst-case ceiling and the value used for all capacity planning and performance budgeting in this spec. If most hydros use AR(6), the dimension reduces to , which is the optimistic case.

3. Variable and Constraint Counts

3.1 Variable Count per Subproblem

ComponentFormulaTypical Count
Future cost1
Deficit6 × 3 × 1 = 18
Excess6 × 3 = 18
Exchange (direct + reverse)2 × 10 × 3 = 60
Hydro storage160
Hydro incremental inflow AR160 × 12 = 1,920
Hydro turbined flow160 × 3 = 480
Hydro spillage160 × 3 = 480
Hydro generation160 × 3 = 480
Hydro inflow (per-block)160 × 3 = 480
Hydro diversion~10 × 3 = 30
Hydro evaporation160 × 3 = 480
Hydro withdrawal160 × 3 = 480
Hydro slacks160 × 3 × 6 = 2,880
Thermal generation130 × 3 × 1 = 390
Contracts5 × 3 = 15
Pumping (flow + power)5 × 3 × 2 = 30
Total Variables~8,400

3.2 Constraint Count per Subproblem

ComponentFormulaTypical Count
Load balance6 × 3 = 18
Hydro water balance160
Incremental inflow AR dynamics160
Lagged incremental inflow fixing160 × 12 = 1,920
Hydro generation (constant)(160 − 50) × 3 = 330
Hydro generation (FPHA)50 × 3 × 125 = 18,750
Outflow definition160 × 3 = 480
Outflow bounds (min/max)2 × 160 × 3 = 960
Turbined min160 × 3 = 480
Generation min160 × 3 = 480
Evaporation160 × 3 = 480
Water withdrawal160 × 3 = 480
Generic constraints~50
Active constraints~24,750
Benders cuts (pre-allocated)10,000-15,000
Total Constraints~35,000-40,000

Note: Among active constraints, FPHA hyperplane constraints dominate (~18,750 of ~24,750). The total constraint count including pre-allocated Benders cut slots reaches ~35,000-40,000, but during early iterations most cut constraints are inactive (bounds set to ). See Solver Abstraction §5 for pre-allocation design.

3.3 Counting Formulas (Exact)

For precise sizing, use the following formulas where parameters come from the system configuration. The production-scale values in §3.1 and §3.2 are obtained by substituting the parameters from §1.

Variables:

N_VAR = 1                                                      # theta
      + N_BUS × N_BLOCK × (AVG_DEF_SEGMENTS + 1)              # deficit + excess
      + 2 × N_LINE × N_BLOCK                                   # exchange
      + N_HYDRO                                                 # storage
      + N_HYDRO × AR_ORDER                                      # incremental inflow model
      + N_HYDRO × N_BLOCK × 4                                   # q, s, g, inflow
      + N_HYDRO_DIV × N_BLOCK                                   # diversion
      + N_HYDRO_EVAP × N_BLOCK                                  # evaporation
      + N_HYDRO_WITHDRAWAL × N_BLOCK                            # withdrawal
      + N_HYDRO × N_BLOCK × N_SLACK_TYPES                       # slack vars
      + N_THERMAL × N_BLOCK × AVG_COST_SEGMENTS                 # thermal
      + (N_CONTRACT_IMP + N_CONTRACT_EXP) × N_BLOCK             # contracts
      + N_PUMP × N_BLOCK × 2                                    # pump flow + power

Constraints:

N_CON = N_BUS × N_BLOCK                                        # load balance
      + N_HYDRO                                                 # water balance
      + N_HYDRO                                                 # inflow AR dynamics
      + N_HYDRO × AR_ORDER                                      # lagged inflow fixing
      + (N_HYDRO - N_HYDRO_FPHA) × N_BLOCK                       # generation (constant)
      + N_HYDRO_FPHA × N_BLOCK × AVG_FPHA_PLANES                # FPHA (additional)
      + N_HYDRO × N_BLOCK                                       # outflow definition
      + N_HYDRO × N_BLOCK × 2                                   # outflow bounds
      + N_HYDRO × N_BLOCK                                       # turbined min
      + N_HYDRO × N_BLOCK                                       # generation min
      + N_HYDRO_EVAP × N_BLOCK                                  # evaporation
      + N_HYDRO_WITHDRAWAL × N_BLOCK                            # withdrawal
      + N_GENERIC                                                # generic constraints
      + N_CUT_CAPACITY                                           # Benders cuts

State Dimension:

N_STATE = N_HYDRO                                               # storage
        + SUM(AR_ORDER[h] for h in HYDROS)                      # AR lags
        # + N_BATTERY  (deferred C.2)
        # + SUM(GNL_LAG[t] for t in GNL_THERMALS)  (deferred C.1)

Production-scale parameter values: AVG_DEF_SEGMENTS = 1, AVG_COST_SEGMENTS = 1, AVG_FPHA_PLANES = 125, N_HYDRO_EVAP = N_HYDRO = 160, N_HYDRO_WITHDRAWAL = N_HYDRO = 160, AR_ORDER = 12 (worst case). All other parameters as in §1.

4. Performance Expectations by Scale

Note: The timing targets in §4.2 are engineering targets for each test system scale, serving as upper bounds for performance regression testing. Section §4.6 provides a separate first-principles wall-clock model for the Production configuration using LP solve time KPIs (§4.3) and parallelism parameters. The complete derivation is in the timing model analysis audit artifact. All estimates assume fully warm-started steady-state iterations and do not account for I/O, checkpointing, or convergence check overhead.

Purpose: This table provides expected timing targets for different problem scales, enabling performance validation and regression detection. Timings are per-iteration unless otherwise noted.

4.1 Hardware Assumptions

ComponentSpecification
CPUAMD EPYC 9R14 or equivalent (192 cores, 3.7 GHz base)
MemoryDDR5, 384 GB/node
NetworkInfiniBand HDR (200 Gb/s) or equivalent
StorageNVMe SSD for I/O operations

4.2 Test Systems

System Definition

ScaleStagesHydrosThermalsAR OrderHydro ProductionFwd PassesOpenings
Unit Test4120Constant210
Small12240Constant410
Medium244200Constant1220
Large244206Constant1620
XLarge361213012Constant3220
2XLarge361213012FPHA (125)3220
Production6016013012FPHA (125)19210

Hardware Configuration and Performance Targets

ScaleRanksThreads/RankForward TimeBackward TimeMemory/Rank
Unit Test12<0.2s<1s
Small22<0.3s<1.5s
Medium26<1s<10s
Large28<1.2s<12s
XLarge28<2s<20s
2XLarge28<3s<30s
Production448<10s<90s

Note: All timing values are engineering targets based on domain experience. The Production row model-derived estimates from §4.6 are substantially lower than the engineering targets, indicating headroom for I/O, cold-starts, and solver variability. Memory/Rank values are to be calculated using the sizing tool; approximate sizing can be derived from the formulas in §3 and Memory Architecture §2.1.

4.3 Key Performance Indicators

MetricTargetMeasurement Point
LP solve (warm-start)≤25 msHot-path, ~8,400-variable subproblem
LP solve (cold-start)≤250 msFirst solve or basis invalid
RHS batch update<100 us~25,000 constraint updates
Solution extraction<50 usPrimal + basis to buffers
Cut exchange (MPI_Allgatherv)<5 msPer-stage, all ranks exchange cuts
Parallel efficiency>80%At 4 ranks vs. 1 rank
Warm-start hit rate>70%Forward pass consecutive stages

4.4 Scaling Expectations

DimensionScaling Behavior
Forward passTime (stages LP solve time) / (ranks threads). Near-linear speedup.
Backward passTime (stages ) openings LP solve time. Sequential stage and opening dependency (warm-start). Each rank redundantly solves all openings for cut locality.
CommunicationWith 4 ranks on InfiniBand, pure data transfer and synchronization are negligible (<0.1% of iteration time). See §4.6 time budget.
Memory per rank solver workspaces (~36 MB threads) + StageLpCache via SharedRegion (~22.3 GB node-wide). See Memory Architecture §2.1
Cut pool growthLogical growth only (pre-allocated slots). Memory stable after initialization.

Thread utilization at 4 ranks: The forward and backward passes have different utilization characteristics. Forward pass: With trajectories distributed across ranks, each rank receives trajectories. With threads per rank, each thread handles exactly 1 trajectory — 100% thread utilization. Backward pass: The trial states (gathered from the forward pass) are distributed across 4 ranks (48 per rank) and then across 48 threads (1 trial state per thread). Each thread evaluates all openings sequentially for its trial state, maintaining warm-start benefit. Per stage: LP solves per rank, 10 sequential solves per thread — 100% thread utilization, with sequential depth . All 4 ranks are active (each rank independently generates cuts from its assigned trial states). The per-iteration compute time is determined by the sequential depth per stage: for the backward pass. See the timing model analysis §8 for the complete utilization comparison.

4.5 Convergence Reference

Problem TypeTypical IterationsOptimality Gap
Simple (few hydros, short horizon)10-20<0.1%
Medium (regional system)30-50<0.5%
Complex (full national grid)50-100<1.0%
With CVaR risk measure+20-50% iterationsSame gap target

Note: Iteration counts assume reasonable initial policy (warm-start from previous study). Cold-start may require 2-3x more iterations. The Production system (60 stages, 160 hydros, AR(12)) falls in the “Complex” category.

4.6 Wall-Clock Time Budget

The following per-iteration time budget is derived from a first-principles model at ms, ranks, threads/rank, (worst-case AR(12)), InfiniBand HDR. The complete derivation with all intermediate steps is in the timing model analysis.

ComponentPer-IterationFractionCategory
Forward pass compute1.500 s9.07%Compute
Backward pass compute14.750 s89.15%Compute
Trial point Allgatherv<0.001 s<0.01%Communication
Cut exchange (59 stages)<0.001 s<0.01%Communication
Convergence Allreduce<0.001 s<0.01%Communication
Barrier overhead (59 stages)0.295 s1.78%Synchronization
Per-iteration total16.545 s100%

Forward pass: ms. Each rank receives trajectories, fully utilizing 48 threads.

Backward pass: ms. Each rank sequentially solves all 10 openings per stage for warm-start benefit.

50-iteration projection:

MetricValueNotes
50-iteration total827.3 s
Wall-clock budget7,200.0 s2-hour operational requirement
Budget fraction11.5%Headroom: 6,373 s (88.5%)
Headroom available forI/O, checkpointing, cold-starts, cut management

Sensitivity: Critical LP Solve Time

The model is linear in . LP solves per iteration: . Including barrier overhead (2% of backward compute), the LP-equivalent coefficient is . The total 50-iteration compute time is:

Setting s and solving for the critical LP solve time:

If the effective LP solve time exceeds approximately 218 ms, the 50-iteration budget is violated. With 60 stages, 10 openings, and 4 ranks, the solver has substantial headroom — the critical threshold is nearly 9x the baseline assumption.

Scenario50-iter totalBudget %Verdict
Optimistic warm-start10 ms~331 s4.6%Well under budget
Baseline (spec KPI)25 ms~827 s11.5%Under budget
Moderate degradation50 ms~1,655 s23.0%Under budget
Elevated solve time100 ms~3,309 s46.0%Under budget
Heavy degradation150 ms~4,964 s68.9%Under budget
Critical threshold~218 ms~7,200 s100%At budget limit

Sensitivity: Backward Pass Warm-Start Rate

The backward pass constitutes ~89% of compute. If the backward pass warm-start hit rate drops below 100% (the design target from Work Distribution §2.3), LP solve times blend between warm-start ( ms) and cold-start ( ms):

Effective 50-iter totalBudget %Verdict
100%25 ms~827 s11.5%Under budget
90%47.5 ms~1,504 s20.9%Under budget
80%70 ms~2,181 s30.3%Under budget
70%92.5 ms~2,858 s39.7%Under budget

With 10 openings and 60 stages, warm-start remains important for performance but is not an existential threat to the 2-hour budget. Even at 70% hit rate, the solver uses ~40% of the budget, leaving 60% headroom for other overhead. The sequential opening evaluation design (Work Distribution §2.3) still targets near-100% warm-start for optimal LP throughput.

4.7 Model Assumptions

Every timing model estimate in §4.6 depends on the assumptions below. The table distinguishes spec-defined KPIs (which are targets to be met by the implementation), spec-defined architecture parameters (which are design decisions), and engineering estimates (which need validation).

AssumptionValueSourceCategorySensitivity
LP solve time (warm-start)25 ms§4.3 KPISpec KPIHigh
LP solve time (cold-start)250 ms§4.3 KPISpec KPIMedium
Forward pass warm-start hit rate70%§4.3 KPISpec KPILow
Backward pass warm-start hit rate~100%Work Distribution §2.3 (sequential openings)Spec ArchitectureLow
MPI ranks4§4.2Spec ArchitectureLow
Threads per rank48§4.2Spec ArchitectureLow
Stages60§1Spec ArchitectureLow
Forward passes192§1Spec ArchitectureLow
Openings10§1Spec ArchitectureLow
Iterations50§1Spec ArchitectureLow
Backward stages = 59Training Loop §6.1Spec ArchitectureLow
One LP solve per stage per trajectory1§3 (blocks within LP, not separate solves)Spec ArchitectureLow
Sequential opening evaluation per threadSequentialWork Distribution §2.3Spec ArchitectureLow
InfiniBand HDR bandwidth25 GB/s§4.1Spec ArchitectureLow
IB protocol overhead factor50%Communication Patterns §3.2Engineering EstimateLow
MPI base latency (InfiniBand)2 usConservative estimate for modern IB HCAEngineering EstimateLow
LP solve time standard deviation15%Work Distribution §4.1 (upper end of 5-15%)Engineering EstimateLow
Load imbalance barrier overhead2%Derived from statistical model of per-stage max-of-ranksEngineering EstimateLow

High-sensitivity parameters: LP solve time ( ms) is the primary assumption with material impact on the time budget, though the critical threshold (~218 ms) provides nearly 9x headroom above the baseline. With 10 openings, backward pass warm-start hit rate () is a moderate concern — at 70% hit rate, the budget fraction reaches ~40%, still well under the limit but with less headroom than the baseline. Early solver benchmarking should prioritize measuring LP solve time under warm-start conditions with production-scale subproblems (~8,400 variables, ~25,000 active constraints).

Cross-References