Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Scenario Generation

Purpose

This spec defines the Cobre scenario generation pipeline. The framework supports pluggable stochastic process models for generating scenarios used by iterative optimization algorithms. The current implementation provides PAR(p) autoregressive inflow models for hydrothermal dispatch, including: PAR model preprocessing, correlated noise generation, the sampling scheme abstraction that governs how scenarios are selected in forward and backward passes, external scenario integration, load scenario generation, and the scenario memory layout optimized for the solution hot-path. For the mathematical definition of the PAR(p) model, see PAR(p) Inflow Model.

1. PAR Model Preprocessing

1.1 Overview

The PAR(p) model generates stochastic inflows during training. Before the training loop begins, the raw PAR parameters — loaded from inflow_seasonal_stats.parquet and inflow_ar_coefficients.parquet — are preprocessed into a contiguous, cache-friendly layout that eliminates per-stage season lookups on the hot path.

For the full PAR(p) model definition, parameter set, and fitting theory, see PAR(p) Inflow Model.

1.2 Preprocessing Workflow

┌─────────────────────────────────────────────────────────────────────────────────┐
│                       PAR Model Preprocessing Pipeline                          │
├─────────────────────────────────────────────────────────────────────────────────┤
│                                                                                 │
│  Input: inflow_seasonal_stats.parquet (μ, s per hydro × stage)                  │
│         inflow_ar_coefficients.parquet (ψ* per hydro × stage × lag, residual_std_ratio) │
│         inflow_history.parquet (optional, for lag initialization)               │
│                                                                                 │
│  Step 1: Load PAR Parameters                                                    │
│  For each (hydro h, stage t) with season m = season(t):                         │
│    μ[h][t] = mean_m3s              (seasonal mean)                              │
│    s[h][t] = std_m3s               (seasonal sample std)                        │
│    ψ*[h][t][ℓ] = coefficient       (AR coefficients, standardized by s)         │
│    r[h][t] = residual_std_ratio    (σ_m/s_m; same for all lags of a group)      │
│    p[h][t] = max(lag) per group    (AR order derived from coefficient count)     │
│                              │                                                  │
│                              ▼                                                  │
│  Step 2: Convert to Original-Unit Coefficients and Derive σ                    │
│  For each (hydro h, stage t) with season m = season(t):                         │
│    ψ[h][t][ℓ] = ψ*[h][t][ℓ] · s[m] / s[m-ℓ]   (runtime conversion; DEC-020)  │
│    σ[h][t] = s[h][t] · r[h][t]                  (σ = s_m · residual_std_ratio)  │
│                              │                                                  │
│                              ▼                                                  │
│  Step 3: Precompute Stage-Specific Deterministic Components                     │
│  For each (hydro h, stage t) with season m = season(t):                         │
│    base[h][t] = μ[h][t] − Σ_ℓ ψ[h][t][ℓ] · μ[h][t−ℓ]                            │
│    coeff[h][t][ℓ] = ψ[h][t][ℓ]  for ℓ = 1..p[h][t]                              │
│    scale[h][t] = σ[h][t]                                                        │
│                              │                                                  │
│                              ▼                                                  │
│  Step 4: Initialize Lag State from History                                      │
│  From inflow_history.parquet (when present):                                    │
│    lag_state[h][ℓ] = historical_inflow[h][t₀ − ℓ]  for ℓ = 1..max_order         │
│                              │                                                  │
│                              ▼                                                  │
│  Output: Precomputed PAR structure (contiguous arrays for hot-path access)      │
│                                                                                 │
└─────────────────────────────────────────────────────────────────────────────────┘

Key distinction DEC-020: The input file stores standardized AR coefficients (, the direct Yule-Walker output) and residual_std_ratio () — not original-unit coefficients and not directly. At preprocessing time: (runtime conversion using conditioning stats) and (no autocorrelations required). This separates swappable seasonal conditioning () from fixed model dynamics (, residual_std_ratio). See PAR(p) Inflow Model SS3 for the full derivation.

1.3 Memory Layout for Hot-Path Access

The precomputed PAR data uses struct-of-arrays layout for cache efficiency. All arrays are row-major with stage as the outer dimension, so that sequential stage iteration within a scenario touches contiguous memory.

Arrays:

ArrayDimensionsLayoutDescription
baseT × N_hydroRow-major (stage outer)Deterministic component per (stage, hydro)
coefficientsT × N_hydro × max_orderRow-majorAR coefficients per (stage, hydro, lag)
scalesT × N_hydroRow-majorResidual std per (stage, hydro)
ordersN_hydroFlatAR order per hydro

Inflow computation for a given (stage, hydro) with lag state and noise value :

This is the runtime form of the PAR(p) equation from PAR(p) Inflow Model SS1, with all season lookups pre-resolved.

1.4 PAR Model Fitting from Historical Data

When AR coefficients are not provided in inflow_ar_coefficients.parquet, Cobre fits PAR models from historical inflow data using the Yule-Walker method with PACF-based order selection. For the mathematical derivation of the fitting procedure, see PAR(p) Inflow Model — Fitting Procedure.

The fitting process:

  1. Season extraction — Group historical observations by season (as defined in season_definitions)
  2. Seasonal statistics — Compute mean () and sample standard deviation () per season
  3. Standardization — Transform observations to zero-mean, unit-variance per season
  4. Order selection — For each season, compute the Periodic Autocorrelation Function (PACF) up to max_order and select the order where the PACF coefficient becomes insignificant
  5. Yule-Walker solution — Solve the Yule-Walker equations via LU factorization of the Toeplitz autocorrelation matrix
  6. Store direct output — Store the standardized coefficients (direct Yule-Walker output) and the computed residual_std_ratio in inflow_ar_coefficients.parquet; no conversion to original units is performed (DEC-020)

The fitted model output includes: seasonal means (, ) stored in inflow_seasonal_stats.parquet, and standardized AR coefficients () plus residual_std_ratio stored in inflow_ar_coefficients.parquet. AR order is implicit from the count of coefficient rows per (hydro, stage) group — it is not stored as a separate field.

Fitted model validation follows the same invariants defined in PAR(p) Inflow Model SS6:

CheckSeverityDescription
Positive residual varianceError for all seasons
PAR seasonal stabilityWarningPer-season AR polynomial roots outside unit circle
Correlation matrix symmetryWarning symmetric (spectral decomposition clips negative eigenvalues)
No systematic biasWarningResiduals mean near zero
AR order consistencyErrorLags are contiguous {1, 2, …, p} per (hydro, stage) group

2. Noise Generation Infrastructure

2.1 Correlated Noise Generation

Hydros within the same correlation group share spatially correlated noise. The correlation structure is defined in correlation.json (see Input Scenarios SS5).

The generation process:

  1. Independent sampling — Generate independent standard normal samples, one per entity in the correlation group
  2. Spectral transform — Pre-decompose the correlation matrix via eigendecomposition during preprocessing. The spectral factor transforms independent noise into correlated noise: . Negative eigenvalues are clipped to zero, yielding the nearest positive-semidefinite approximation. The method field in correlation.json defaults to "spectral"; "cholesky" is accepted for backward compatibility
  3. Entity assignment — Each entity in the group receives its own correlated noise value from the transformed vector

Entities in different correlation groups are independent of each other. Entities not assigned to any group receive independent noise.

2.2 Reproducible Sampling

Noise generation must produce identical results regardless of MPI rank assignment, thread scheduling, or restart. This is achieved through deterministic seed derivation:

  • A base seed is specified in config.json
  • Each (iteration, scenario, stage) tuple maps to a unique derived seed via a deterministic hash function
  • The RNG state is initialized from this derived seed before generating the noise vector for that tuple

This design ensures:

  • Cross-rank reproducibility — The same scenario/stage produces the same noise regardless of which MPI rank processes it
  • Restart reproducibility — Resuming from a checkpoint produces identical noise for subsequent iterations
  • Order independence — Results are identical regardless of the order in which scenarios or stages are processed

Architectural note: Because every rank can independently derive the same noise for any (iteration, scenario, stage) tuple by hashing identical inputs with the same RNG, no MPI broadcast of noise data is required. This is a key architectural property: rank 0 does not generate all noise and broadcast it to worker ranks. Instead, each rank generates only the noise it needs, exactly when it needs it. This property eliminates the need for a dedicated scenario innovation noise broadcast format — the seed-based architecture already resolves the distribution problem.

2.2a Seed Derivation Function

The deterministic hash function referenced in SS2.2 is SipHash-1-3, provided by the siphasher crate (version 1.x). The siphasher crate guarantees output stability across crate versions, making it suitable for cross-platform and cross-build reproducibility. The hash does not need to be cryptographic — SipHash-1-3 is chosen for speed and sufficient collision resistance over the small input domain, not for security properties.

Convention: The standard library’s std::collections::hash_map::DefaultHasher MUST NOT be used. Its output is explicitly not guaranteed to be stable across Rust compiler versions or platforms.

Input encoding (forward pass). The hash input is a fixed-width, little-endian byte sequence constructed by concatenating four integers:

input = base_seed.to_le_bytes()       // u64, 8 bytes
     ++ iteration.to_le_bytes()        // u32, 4 bytes
     ++ scenario.to_le_bytes()         // u32, 4 bytes
     ++ stage.to_le_bytes()            // u32, 4 bytes

Total input: 20 bytes. The ++ operator denotes byte-level concatenation (no separators, no length prefixes).

Input encoding (opening tree). For opening tree generation (SS2.3), the input replaces iteration and scenario with a single opening_index:

input = base_seed.to_le_bytes()       // u64, 8 bytes
     ++ opening_index.to_le_bytes()    // u32, 4 bytes
     ++ stage.to_le_bytes()            // u32, 4 bytes

Total input: 16 bytes.

Output. The derived seed is the full 64-bit SipHash-1-3 output. This 64-bit value is used to initialize a Pcg64 (or equivalent) pseudo-random number generator, which then produces the noise vector for the corresponding tuple.

Endianness requirement. Little-endian encoding is mandatory. Native byte encoding is not acceptable because it would produce different hash inputs on big-endian and little-endian architectures, breaking cross-platform reproducibility. All integer-to-bytes conversions use explicit .to_le_bytes() calls.

Crate version requirement. The siphasher crate version must be pinned to 1.x (i.e., siphasher = "1" in Cargo.toml). The 1.x series provides the SipHasher13 type and guarantees output stability within the major version. Any future major version upgrade requires verification that hash outputs remain identical for the same inputs, or a migration path for checkpoint compatibility.

2.2b Work Distribution for Noise Generation

Decision DEC-017 (active): Communication-free parallel noise generation – every rank and thread independently derives identical noise via deterministic SipHash-1-3 seed derivation, eliminating MPI broadcast or gather for scenario noise.

This subsection documents the parallel calling convention for forward pass noise generation, analogous to the cut selection work distribution in Cut Selection Strategy Trait SS2.2a.

Why no communication is needed. Deterministic seed derivation (SS2.2a) eliminates the need for MPI communication of noise data. Every rank independently derives identical noise for any (iteration, scenario_index, stage_id) tuple by hashing identical inputs with the same base seed and RNG algorithm. This is fundamentally different from cut selection, which requires allgatherv to gather DeactivationSet results because cut selection decisions are distributed across ranks — each rank selects for a subset of stages and must share the results with all other ranks. Noise generation has no such dependency: each rank generates only the noise it needs for its assigned scenarios, and no other rank needs that noise.

Scenario distribution formula. Forward scenarios are distributed across ranks using contiguous block assignment per Work Distribution §3.1:

ParameterFormula
Total forward scenarios (configured via forward_passes in config.json)
Scenarios per rank or (contiguous block assignment per Work Distribution §3.1)
Rank ’s first scenario per Work Distribution §3.1
Rank ’s last scenario per Work Distribution §3.1
Inter-rank communication for noiseNone – deterministic seed derivation (SS2.2a)

Within-rank threading model. Within each rank, Rayon worker threads generate noise independently for their assigned forward trajectories. Each thread uses the deterministic seed seed(iteration, scenario_index, stage_id) to initialize its RNG before generating the noise vector for each (scenario, stage) pair. Because the seed depends on (iteration, scenario_index, stage_id) – not on thread ID, rank, or rank count – any thread on any rank generates identical noise for the same tuple. No thread synchronization is needed for noise generation.

Comparison with cut selection work distribution.

AspectCut Selection (SS2.2a in cut-selection-trait.md)Noise Generation (this subsection)
Work unitStage(Scenario, Stage) pair
Distribution axisStages across ranksScenarios across ranks (stages sequential per scenario)
Inter-rank communicationallgatherv for DeactivationSetsNone (deterministic seeds)
Within-rank threadingRayon work-stealing across assigned stagesRayon work-stealing across assigned scenarios
Data dependencyRequires synchronized cut pool metadataIndependent – only needs base seed and PAR parameters

Backward pass note. Backward pass noise generation also requires no inter-rank communication. The opening tree is generated once before training and shared via SharedRegion<T> within a node (Shared Memory Aggregation §1.4). During the backward pass, each thread reads the opening tree noise vectors directly from shared memory – no noise generation occurs at runtime.

2.3 Opening Tree

The backward pass in SDDP evaluates an aggregated cut by solving all branchings at each stage . These branchings must be identical across all iterations — the backward pass always “sees the same tree.” The opening tree is therefore generated once before training begins and remains fixed throughout.

The per-stage branching factor comes from ScenarioSourceConfig.branching_factor (Internal Structures SS12.5). Uniform branching ( for all ) is the common case in standard SDDP, but per-stage variable branching is supported — this is required for complete tree mode (SS7.2), where the DECOMP special case uses for and .

Tree generation:

  1. Before the first SDDP iteration, generate noise vectors per stage , producing a fixed opening tree with total element count
  2. Each noise vector is generated from the base seed using deterministic seed derivation per (opening_index, stage) — ensuring reproducibility across restarts and MPI configurations (see SS2.2)
  3. Correlation is applied per the active profile for each stage (see SS2.5)

Backward pass usage: At each stage , the backward pass iterates over all noise vectors, solving one subproblem per opening, then aggregates the resulting cuts. Because the tree is fixed, every iteration produces cuts that refine the same set of future cost scenarios. Note that may vary per stage.

Forward pass usage (InSample only): When the InSample sampling scheme is active (see SS3.2), the forward pass samples a random index at each stage and uses the corresponding noise vector to generate the stage realization. The sampling is a single random integer per stage — no noise generation occurs during the forward pass. Other sampling schemes (External, Historical) use entirely separate data sources and do not access the opening tree; the forward pass noise path is governed by the SamplingScheme abstraction (SS3), not by OpeningTree directly.

Memory layout: The opening tree uses stage-major ordering for backward pass locality. At each backward pass stage, all noise vectors are accessed sequentially; stage-major layout places these vectors contiguously in memory, enabling linear prefetch and avoiding large strides across openings:

[stage_0_opening_0_entity_0, ..., stage_0_opening_0_entity_E,   ← N_0 openings
 stage_0_opening_1_entity_0, ..., stage_0_opening_1_entity_E,      for stage 0
 ...
 stage_1_opening_0_entity_0, ...,                                ← N_1 openings
 ...                                                                for stage 1
 stage_T_opening_0_entity_0, ..., stage_T_opening_Nt_entity_E]  ← N_T openings

Offset-based access: The noise vector for (stage, opening_idx) occupies the contiguous slice:

data[stage_offsets[stage] + opening_idx * dim .. + dim]

where dim = (number of random variables per stage) and stage_offsets[t] = is the cumulative offset for stage . For uniform branching, this reduces to stage * n_openings * dim + opening_idx * dim.

Total size: bytes. For a hypothetical maximum configuration (120 stages × 200 openings × 160 hydros = ~30 MB), the entire tree fits in L3 cache. For the production reference configuration in Memory Architecture §2.1 (60 stages × 10 openings × 160 entities × 8 bytes = ~0.8 MB), the tree fits in L2 cache.

2.3a OpeningTree Rust Type

The OpeningTree struct is the concrete Rust type that holds the pre-generated noise realizations described in SS2.3. It is defined in cobre-stochastic and consumed read-only by cobre-sddp during the training loop’s backward pass (Training Loop SS6.2).

Type Definition

#![allow(unused)]
fn main() {
pub struct OpeningTree {
    /// Flat contiguous storage for all noise values.
    /// Layout: stage-major with variable branching per stage.
    /// Length: sum(openings_per_stage[t] * dim for t in 0..n_stages).
    data: Box<[f64]>,

    /// Cumulative offset array for stage-based indexing.
    /// Length: n_stages + 1.
    /// stage_offsets[t] = sum(openings_per_stage[s] * dim for s in 0..t).
    /// stage_offsets[n_stages] = data.len().
    stage_offsets: Box<[usize]>,

    /// Number of openings (branchings) per stage.
    /// Length: n_stages.
    /// For uniform branching: all entries are equal.
    openings_per_stage: Box<[usize]>,

    /// Number of stages in the planning horizon (T).
    n_stages: usize,

    /// Number of random variables per noise vector.
    dim: usize,
}
}

Field descriptions:

FieldTypeDescription
dataBox<[f64]>Flat contiguous storage. Length = . Heap-allocated, non-resizable after construction.
stage_offsetsBox<[usize]>Cumulative offset array. stage_offsets[t] = . Length = n_stages + 1. The sentinel stage_offsets[n_stages] = data.len().
openings_per_stageBox<[usize]>Per-stage branching factors . Length = n_stages. Sourced from ScenarioSourceConfig.branching_factor (Internal Structures SS12.5).
n_stagesusizeNumber of stages in the planning horizon.
dimusizeNumber of random variables (entities) per noise vector .

Why Box<[f64]> and not Vec<f64>: The opening tree is allocated once and never resized. Box<[f64]> communicates this invariant at the type level — there is no capacity field, no growth path, and no accidental reallocation. The box is created from a Vec<f64> via into_boxed_slice() after generation completes. The same reasoning applies to stage_offsets and openings_per_stage.

Why this layout:

  • stage_offsets enables O(1) access to any (stage, opening) pair without iterating
  • All openings for a given stage are contiguous — same cache-friendly backward pass pattern as described in SS2.3
  • Uniform branching is a degenerate case where all openings_per_stage entries are equal and stage_offsets is a simple arithmetic progression
  • No wasted space: total data length = exactly

Access Method

#![allow(unused)]
fn main() {
impl OpeningTree {
    /// Return the noise vector for the given (stage, opening_idx) pair
    /// as a contiguous `f64` slice of length `self.dim`.
    ///
    /// # Panics
    ///
    /// Panics if `stage >= self.n_stages` or
    /// `opening_idx >= self.openings_per_stage[stage]`.
    pub fn opening(&self, stage: usize, opening_idx: usize) -> &[f64] {
        assert!(stage < self.n_stages);
        assert!(opening_idx < self.openings_per_stage[stage]);
        let offset = self.stage_offsets[stage] + opening_idx * self.dim;
        &self.data[offset..offset + self.dim]
    }

    /// Return the number of openings at the given stage.
    pub fn n_openings(&self, stage: usize) -> usize {
        self.openings_per_stage[stage]
    }

    /// Return the number of stages.
    pub fn n_stages(&self) -> usize { self.n_stages }

    /// Return the dimension (entities per noise vector).
    pub fn dim(&self) -> usize { self.dim }

    /// Return the total number of f64 elements in the backing array.
    pub fn len(&self) -> usize { self.data.len() }

    /// Return the total size in bytes of the backing array.
    pub fn size_bytes(&self) -> usize { self.data.len() * std::mem::size_of::<f64>() }
}
}

Note: n_openings() takes a stage parameter because the branching factor can vary per stage. For uniform branching, the caller can use n_openings(0) or check any stage.

Backward pass usage pattern (Training Loop SS6.2):

#![allow(unused)]
fn main() {
// At stage t, evaluate all openings for trial point x_hat
for j in 0..opening_tree.n_openings(t) {
    let noise: &[f64] = opening_tree.opening(t, j);
    // noise[h] is the correlated noise value for entity h at opening j
    // Compute realized inflows via PAR model, patch LP, solve, extract duals
}
}

The inner loop accesses n_openings(t) contiguous blocks of dim f64 values, each separated by exactly dim * 8 bytes. This is the optimal access pattern for hardware prefetchers.

Forward/Backward Separation

The OpeningTree is the backward pass’s primary noise data structure. At every backward pass stage, all openings are evaluated to generate Benders cuts. The forward pass does NOT always use the opening tree — only the InSample sampling scheme draws from it (via random index selection into the stage’s openings). The External and Historical schemes use entirely separate data sources (external scenarios, historical records). The forward pass noise path is governed by the SamplingScheme abstraction (SS3), not by OpeningTree directly.

Borrowed View Type

#![allow(unused)]
fn main() {
/// Read-only view over opening tree data, providing offset-based access.
/// Borrows the underlying storage. Used by the training loop; does not
/// own the data.
pub struct OpeningTreeView<'a> {
    data: &'a [f64],
    stage_offsets: &'a [usize],
    openings_per_stage: &'a [usize],
    n_stages: usize,
    dim: usize,
}
}

The OpeningTreeView provides the same access API as OpeningTree (opening(), n_openings(), n_stages(), dim()) but borrows the data instead of owning it. The dual-type design (OpeningTree for owned generation, OpeningTreeView for borrowed consumption) follows the dual-nature principle established in Internal Structures SS1.1: cobre-stochastic owns the generation and the OpeningTree struct, while cobre-sddp consumes it through OpeningTreeView<'a> without taking ownership.

2.3b Sampling Method and Opening Tree Generation

The sampling_method field on each stage in stages.json (see Input Scenarios SS1.8) controls the algorithm used to generate the noise vectors for that stage’s opening tree entries (SS2.3). This is orthogonal to the SamplingScheme abstraction (SS3): sampling_method governs how the opening tree is populated with noise vectors; SamplingScheme governs which noise source the forward pass uses.

SAA — the Phase 5 implementation. For the minimal viable solver, the only implemented sampling_method is "saa" (Sample Average Approximation): uniform Monte Carlo random sampling from the RNG seeded by the deterministic seed function (SS2.2a). Each noise vector component is drawn as an independent standard normal , then transformed by the spectral correlation factor (SS2.1) to produce the correlated noise vector . SAA is the default when sampling_method is "saa" or omitted from the stage definition.

Summary of sampling methods:

MethodDescriptionPhase 5 Status
saaSample Average Approximation — uniform Monte Carlo from seeded RNGImplemented
lhsLatin Hypercube Sampling — stratified, uniform marginal coverageImplemented
qmc_sobolQuasi-Monte Carlo (Sobol sequences) — low-discrepancy deterministic-likeImplemented
qmc_haltonQuasi-Monte Carlo (Halton sequences) — alternative low-discrepancy methodImplemented
selectiveSelective/Representative Sampling — clustering on historical dataDeferred

The selective method is listed in Deferred Features and is not yet implemented.

Per-stage variation. The sampling_method field can vary per stage (as permitted by Input Scenarios SS1.8), enabling mixed strategies in a single run. The minimal viable solver uses uniform SAA across all stages; per-stage method variation is a deferred capability.

2.3c Parallel Opening Tree Generation

This subsection documents how opening tree generation is parallelized across MPI ranks and threads. The SharedRegion<T> allocation and fence protocol is defined in Shared Memory Aggregation §1.4; this subsection provides the partitioning details that protocol references but does not specify (step 3: “contiguous block assignment”).

Multi-node model. Because opening tree seeds depend only on (base_seed, opening_index, stage) (SS2.2a), every node can generate its opening tree independently without cross-node communication. On multi-node deployments, each node generates the complete tree using the same deterministic seeds, producing bit-identical results. The tree is then shared within the node via SharedRegion<T> (Shared Memory Aggregation §1.4), so only one copy resides in physical memory per node despite being readable by all ranks on that node.

Within-node partitioning formula. The natural work unit for opening tree generation is a single (opening_index, stage) pair. The total generation work is pairs (one per opening per stage). For uniform branching, this simplifies to pairs. The partitioning distributes openings across ranks (not stages), because the stage dimension is the inner loop for each opening’s seed derivation:

ParameterFormula
Total openings for variable branching; for uniform
Openings per rank or (contiguous block per Work Distribution §3.1)
Rank ’s first openingPer Work Distribution §3.1
Rank ’s last openingPer Work Distribution §3.1
Each rank generates all stages for its assigned openingsStages are the inner loop; openings are the outer loop

Why openings, not stages, are the distribution axis. Distributing by opening (rather than by stage) aligns with the stage-major memory layout (SS2.3). Each rank generates a contiguous block of openings across all stages, which maps to a contiguous memory region in the SharedRegion<T>. This enables each rank to write its generated data directly to the correct offset without interleaving with other ranks’ writes. If stages were the distribution axis, each rank’s writes would be scattered across the memory region (one block per stage), increasing the complexity of offset computation and potentially causing cache line conflicts during the parallel write phase.

Within-rank threading model. Within each rank, the assigned openings are distributed across Rayon worker threads. Each thread generates all stages for its assigned openings using the deterministic seed seed(base_seed, opening_index, stage) (SS2.2a). No thread synchronization is needed during generation because each thread writes to a disjoint portion of the SharedRegion<T>.

Reconciliation with shared memory protocol. This subsection provides the partitioning details referenced by Shared Memory Aggregation §1.4 step 3 (“contiguous block assignment”). The 6-step protocol in §1.4 remains the authoritative reference for the SharedRegion<T> allocation and fence sequence; this subsection documents the generation work distribution within that protocol.

Variable branching note. For variable branching ( varies per stage), the partitioning uses as the total opening count. Ranks assigned openings that exceed for a particular stage simply skip that (opening_index, stage) pair. This produces at most a minor load imbalance (ranks with higher opening indices may skip more pairs) which is negligible for the one-time generation phase.

2.4 Time-Varying Correlation Profiles

The correlation structure can vary across stages via the profile + schedule pattern defined in Input Scenarios SS5.3:

  • Profiles — Named correlation configurations (e.g., "default", "wet_season", "dry_season"), each defining correlation groups and matrices
  • Schedule — An optional array embedded in correlation.json that maps specific stage_id values to profile names. Stages not listed use the "default" profile.

During preprocessing, the spectral decomposition is computed once per profile (not per stage). At runtime, the scenario generator looks up the active profile for the current stage via the schedule and uses its pre-computed spectral factor.

3. Sampling Scheme Abstraction

3.1 Three Orthogonal Concerns

The SDDP algorithm has three independently configurable concerns that govern how scenarios are handled during training. Cobre formalizes these as distinct abstractions, following the design established by SDDP.jl:

ConcernAbstractionWhat It ControlsDefault
Which noise is selected at each forward pass stageSampling SchemeForward scenario selectionInSample
Which LP model is solved in the forward passForward Pass ModelTraining LP vs alternative modelDefault
Which noise terms are evaluated in the backward passBackward SamplingBranching completenessComplete

These three concerns are orthogonal — each can be configured independently without affecting the others. The forward pass model and backward sampling are fixed in the initial implementation (Default and Complete respectively), with alternatives deferred. The sampling scheme is the primary configurable dimension.

Forward and backward noise source separation is a natural consequence of this design: the sampling scheme controls the forward pass noise source, while backward sampling always draws from the fixed opening tree (SS2.3). These two sources may differ — for example, the forward pass may sample from external scenarios while the backward pass evaluates all openings generated from a PAR model fitted to those scenarios.

3.2 Forward Sampling Schemes

The sampling scheme determines how the forward pass selects a scenario realization at each stage. Cobre supports four sampling schemes, configured independently per stochastic class (inflow, load, NCS) via per-class sub-objects in training.scenario_source (see Input Scenarios SS2.1):

InSample (Default)

At each stage , sample a random index and use the corresponding noise vector from the fixed opening tree (SS2.3). The noise values are injected into the stage LP by fixing the noise variable via patch_row_bounds on the AR dynamics constraint row (see Training Loop SS4.2a). The PAR model dynamics equation is already embedded in the LP as a constraint (LP Formulation SS5a); the solver evaluates the inflow realization implicitly when it solves the LP with the fixed noise.

  • Noise source: Opening tree (same as backward pass)
  • Realization computation: LP solve with noise epsilon fixed via patch_row_bounds on the AR dynamics constraint row
  • Use case: Standard SDDP training — forward and backward passes see the same noise distribution

This is SDDP.jl’s InSampleMonteCarlo: the forward pass samples from the same noise terms defined in the model.

OutOfSample

The forward pass draws from independently generated Monte Carlo noise that is distinct from the opening tree noise. At each stage , a fresh noise vector is generated from the same PAR model used for the opening tree, but with independent random draws. The backward pass uses the same fixed opening tree as InSample.

  • Noise source: Independently generated Monte Carlo noise (not from the opening tree)
  • Realization computation: LP solve with noise epsilon fixed via patch_row_bounds, same as InSample
  • Backward pass interaction: The backward pass uses the fixed opening tree generated from the same PAR model
  • Use case: Out-of-sample forward evaluation to reduce in-sample bias

This corresponds to SDDP.jl’s OutOfSampleMonteCarlo: the forward pass uses noise terms different from those defined in the model, but drawn from the same distribution.

External

The forward pass draws from user-provided per-class scenario data (e.g., external_inflow_scenarios.parquet). At each stage , a scenario is selected from the external set.

  • Noise source: External scenario values (not the opening tree)
  • Realization computation: The external scenario provides values (e.g., inflow), but these are not used directly as LP inputs. The SDDP LP formulation includes the AR dynamics equation as a constraint, with inflow lags and the current-stage noise term as variables fixed via fixing constraints. Therefore, external values must always be inverted to noise terms (epsilon) before they can be used in the LP. This noise inversion is described in SS4.3.
  • Backward pass interaction: The backward pass still uses the fixed opening tree. When external scenarios are the forward source, the opening tree noise is generated from a PAR model fitted to the external data, ensuring the backward branchings reflect the statistical properties of the external scenarios (see SS4.2)
  • Use case: Training with imported Monte Carlo scenarios or stress-test scenarios

Historical

Replay actual historical sequences mapped to stages via season_definitions. The forward pass deterministically follows historical data in order, cycling through available years when the number of forward passes exceeds the historical record.

  • Noise source: Historical values (mapped from inflow_history.parquet or equivalent to stage structure)
  • Realization computation: Like external scenarios, historical values must be inverted to noise terms (epsilon) before use in the LP. The same noise inversion procedure applies (see SS4.3).
  • Backward pass interaction: Same as External — the backward pass uses a PAR model fitted to the historical data
  • Use case: Policy validation against observed conditions, historical replay analysis

This corresponds to SDDP.jl’s Historical sampling scheme.

3.3 Forward Pass Model

The forward pass model determines which LP is solved at each stage during the forward pass. Cobre implements one model:

  • Default — Solve the training LP (the convex LP used for cut generation) with the scenario realization from the sampling scheme. This is the standard SDDP forward pass.

An Alternative Forward Pass model — solving a different LP (e.g., with simulation-only features like linearized head or unit commitment) to generate trial points, while keeping the training LP for cut generation — is deferred. See Deferred Features SSC.13.

3.4 Backward Sampling

The backward sampling scheme determines which noise terms are evaluated at each stage during the backward pass. Cobre implements one scheme:

  • Complete — Evaluate all noise vectors from the fixed opening tree. This is the standard SDDP backward pass that guarantees proper cut generation by considering every branching.

A MonteCarlo(n) backward sampling scheme — sampling openings with replacement instead of evaluating all — is deferred. See Deferred Features SSC.14.

3.5 Configuration

The sampling scheme is configured in config.json via training.scenario_source (for training) and simulation.scenario_source (for simulation). Each stochastic class (inflow, load, NCS) has its own scheme. When simulation.scenario_source is absent, it falls back to training.scenario_source. See Input Scenarios SS2.1 for the full schema.

Summary of scheme-to-config mapping (per class):

Sampling SchemePer-class config exampleRequired inputs
InSample{ "scheme": "in_sample" }Uncertainty models (SS1) or inflow history
OutOfSample{ "scheme": "out_of_sample" }Same as InSample (independent noise from same model)
External{ "scheme": "external" }Per-class external scenario file (e.g., external_inflow_scenarios.parquet)
Historical{ "scheme": "historical" }inflow_history.parquet + season_definitions

4. External Scenario Integration

4.1 External Scenario Sources

Cobre supports external (deterministic) scenarios as an alternative forward pass noise source. External scenarios can drive the forward pass in both training and simulation.

Use CaseDescription
Historical replayUse actual historical inflows
Monte Carlo importPre-generated scenarios from external tool
Stress testingSpecific drought/flood scenarios

Input files (per-class): scenarios/external_inflow_scenarios.parquet, scenarios/external_load_scenarios.parquet, scenarios/external_ncs_scenarios.parquet. Each class has its own file with class-specific entity ID and value columns. See Input Scenarios SS2.5.

4.2 Backward Pass with External Forward Scenarios

When the External or Historical sampling scheme is active during training, the forward and backward passes use different noise sources. The backward pass still requires proper probabilistic branchings for valid cut generation, so the opening tree noise must reflect the statistical properties of the external data.

The lifecycle is:

  1. PAR fitting — Fit a PAR model to the external scenario data (or historical inflow data), treating the external values as a synthetic history. The fitting follows the same procedure as SS1.4: seasonal statistics and AR coefficients are estimated from the external data.
  2. Opening tree generation — Generate the fixed opening tree (SS2.3) using noise from the fitted PAR model. The branchings reflect the distributional characteristics of the external scenarios.
  3. Training — Forward pass samples from external data; backward pass evaluates all openings from the PAR-fitted tree.

Rationale: Using external scenarios directly in the backward pass would violate SDDP’s requirement for proper probabilistic branchings. By fitting a PAR model to the external data, the backward pass preserves the statistical signature of the scenarios while producing valid cuts.

4.3 Noise Inversion for External Scenarios

When using external scenarios, Cobre must compute the implied noise values that would have generated those inflows under the PAR model. This is required because the backward pass needs noise values to construct the appropriate RHS perturbations for cut generation.

Given target inflow at stage for hydro with season :

where is the precomputed base value.

All quantities (, , ) are in their respective units as described in PAR(p) Inflow Model SS2-3. The AR coefficients are in original units; is the derived residual std.

The inversion proceeds sequentially through stages (each stage updates the lag buffer for the next):

  1. Initialize lag buffer from historical inflows or initial conditions
  2. For each stage : compute the deterministic PAR component, solve for , update the lag buffer with
  3. Validate the inverted noise:
    • Warning if (extreme noise suggests the external scenario deviates significantly from the PAR model)
    • Error if but the residual exceeds a tolerance (the PAR model says this series is deterministic, but the external scenario disagrees)

After inversion, a JSON validation report is emitted with noise statistics (mean, std, min, max, extreme count), warnings, and an overall status.

Critical: If AR order mismatches between the PAR model and a warm-start policy, noise inversion produces incorrect values. This is validated during policy loading (see Validation Architecture SS2.5b).

4.4 External Scenarios in Simulation

When a stochastic class uses the External sampling scheme during simulation, the forward pass returns values directly from the pre-loaded per-class data — no stochastic computation occurs. The forward pass iterates through the external scenarios deterministically.

5. Scenario Memory Layout

5.1 Memory Organization

Scenario data uses scenario-major ordering optimized for the forward pass access pattern:

Access Pattern in Forward Pass:
  - Outer loop: scenarios (parallel across threads)
  - Inner loop: stages (sequential within scenario)
  - Innermost: entities (sequential within stage)

Layout: [S0_T0_H0] [S0_T0_H1] ... [S0_TT_HN] [S1_T0_H0] ... [SS_TT_HN]

Benefits:

  • Each thread accesses contiguous memory for its assigned scenarios
  • No false sharing between threads (each thread’s scenarios are in separate memory regions)
  • Predictable prefetching within the stage sequence

Size estimate (hypothetical maximum): 200 scenarios × 120 stages × 160 hydros × 8 bytes = ~30 MB for inflows. At the production reference configuration (192 scenarios × 60 stages × 160 hydros × 8 bytes = ~14 MB), the cache fits comfortably in L3 per NUMA domain.

Backward pass consideration: The backward pass iterates over stages in reverse, accessing all scenarios at a given stage. With scenario-major layout, this accesses non-contiguous memory (stride = n_stages × n_entities). This is acceptable because: (a) the noise cache is relatively small and fits in L3, (b) the alternative (stage-major layout) would penalize the more latency-sensitive forward pass, and (c) the backward pass LP reuses basis and warm-starts from the previous opening solution, changing only the noise terms — so memory access latency is unlikely to be the bottleneck, though this should be validated with profiling once the solver is operational.

5.2 Two-Level Work Distribution

Cobre distributes scenario work across two levels: MPI ranks (inter-node / inter-NUMA) and threads (intra-rank).

MPI rank level — deterministic distribution. Scenarios are assigned to MPI ranks as evenly as possible. If S total scenarios are distributed across R ranks, the first S mod R ranks receive ⌈S/R⌉ scenarios and the remaining ranks receive ⌊S/R⌋. The distribution is deterministic and based solely on rank index and total scenario count. Each rank stores only its assigned scenarios in memory. MPI gather operations use the per-rank counts and displacements to collect results (e.g., for upper bound evaluation or output aggregation).

Thread level — dynamic work-stealing. Within each rank, the assigned scenarios are processed by a thread pool using dynamic work-stealing scheduling. This is critical because per-scenario processing costs are not uniform — iteration counts for LP convergence vary depending on the noise realization, active constraints, and warm-start quality. Dynamic work-stealing at the thread level absorbs this variability without requiring inter-rank communication.

Per-thread noise generation calling convention. When a Rayon worker thread processes a forward trajectory, it generates the noise vector at each stage using the following deterministic sequence:

a. Seed derivation — Compute the deterministic seed via seed(base_seed, iteration, scenario_index, stage_id) using SipHash-1-3 (SS2.2a). The seed depends only on globally known constants — no thread-local or rank-local state is needed.

b. RNG initialization — Initialize a Pcg64 (or equivalent) pseudo-random number generator from the derived 64-bit seed.

c. Independent noise sampling — Generate independent standard normal samples from the RNG.

d. Correlation transform — Apply the spectral factor for the active correlation profile at this stage (SS2.1): . The spectral factors are pre-computed during initialization and shared read-only across all threads.

e. LP noise term fixup — The resulting correlated noise vector is used to fix the noise terms in the stage LP (via the PAR inflow equation in PAR(p) Inflow Model SS1).

Steps (a) through (e) are executed entirely within the thread. No inter-thread synchronization, no rank-to-rank communication, and no shared mutable state is involved. The only shared data accessed is the pre-computed spectral factors (read-only) and the PAR model parameters (read-only). This communication-free noise generation is a direct consequence of the deterministic seed derivation architecture (SS2.2, SS2.2a) and the work distribution model documented in SS2.2b.

InSample variant note. For the InSample sampling scheme, the forward pass does not generate noise at all. Instead, it uses the RNG to sample a random index into the pre-generated opening tree (SS2.3). The full 5-step sequence above applies only to External and Historical variants, which require noise inversion from raw inflow values (SS4.3). For InSample, step (c) is replaced by “sample a uniform integer in ” and steps (d)-(e) are replaced by “look up the opening tree vector at the sampled index.”

Deployment strategy. The shared-memory architecture favors fewer MPI ranks with more threads per rank — typically one rank per NUMA domain (or per node). This minimizes MPI communication overhead while maximizing the benefit of thread-level dynamic scheduling and shared L3 cache for scenario data (see SS5.1).

5.3 NUMA-Aware Allocation

On NUMA architectures, scenario data is allocated with first-touch policy awareness. Each thread initializes (writes to) its portion of the scenario array during allocation, ensuring that the OS places those pages on the NUMA domain local to the thread’s CPU. This avoids remote memory access during the forward pass.

6. Load Scenario Generation

6.1 Load Uncertainty Model

When load_seasonal_stats.parquet is provided, the system generates stochastic load realizations per (bus, stage) using the stored mean and standard deviation. Load models are typically independent (no AR structure) — each load realization is drawn as:

where is an independent noise term. See Input Scenarios SS3.3.

When load_seasonal_stats.parquet is absent, loads are treated as deterministic (taken from the demand values in the entity definitions).

6.2 Block Load Factors

The base load realization is a stage-level value in MW. Block-level loads are obtained by applying multiplicative block factors:

where is the block factor from load_factors.json. If load_factors.json is absent, all block factors default to 1.0. See Input Scenarios SS4.

6.3 Load Correlation

If load entities are included in correlation groups defined in correlation.json, their noise terms are correlated with inflow noise via the same spectral transform described in SS2.1. Otherwise, load noise is independent of inflow noise.

7. Complete Tree Mode

7.1 Concept

In addition to standard SDDP sampling, Cobre supports a complete tree execution mode where the solver explores an explicit scenario tree — every branching at every stage is visited, with no sampling. This is the approach used by CEPEL’s DECOMP model for short-term hydrothermal dispatch.

In standard SDDP, the forward pass samples one branching per stage, so only a fraction of the scenario tree is explored per iteration. In complete tree mode, the full tree is enumerated, and each node corresponds to a deterministic subproblem. The Benders decomposition is still applied — cuts propagate backward through the tree — but there is no stochastic sampling; the solution is exact for the given tree.

7.2 Tree Structure

The scenario tree is defined by per-stage branching counts for :

  • Total nodes at stage :
  • Total leaf nodes:
  • Total tree nodes:

Each node at stage has children, each corresponding to a distinct realization (noise vector or external scenario value). The branchings at each stage are drawn from the opening tree (SS2.3), so when using uniform branching, or the tree can have variable branching factors per stage.

DECOMP special case: for all stages except the last (), where equals the number of external scenario branchings. This produces a deterministic trunk with branching only at the final stage — a common structure for weekly/monthly short-term planning where uncertainty is resolved at the end of the horizon.

DECOMP Special Case (N_t = 1 for t < T, N_T branchings at last stage):

Stage:  1        2        3        ...      T
        ●────────●────────●── ... ──●───────● branch 1
                                    ├───────● branch 2
                                    ├───────● branch 3
                                    └───────● branch N_T

General Case (N_t branchings per stage):

Stage:  1             2                  3
        ●─────────────●─────────────────●
        │             ├─────────────────●
        │             └─────────────────●
        ├─────────────●─────────────────●
        │             ├─────────────────●
        │             └─────────────────●
        └─────────────●─────────────────●
                      ├─────────────────●
                      └─────────────────●

Total nodes at stage 3: N_1 × N_2 × N_3

7.3 Relationship to SDDP with External Scenarios

Complete tree mode is closely related to standard SDDP with external scenarios. Consider the case where:

  • External scenarios are used in training (SS3.2 External scheme)
  • at the last stage
  • All branchings at the last stage are visited exactly once (no repetition)

This configuration approaches a complete tree solution for the last stage. The key difference is that standard SDDP sampling may repeat or skip branchings, while complete tree mode guarantees exhaustive coverage.

To bridge the two modes, the forward pass can be configured to force exhaustive visitation — cycling through all branching indices without replacement rather than sampling with replacement. When combined with a single forward pass per branching at the final stage, this degenerates exactly into the complete tree for the DECOMP special case.

7.4 Scope and Limitations

Complete tree mode is feasible only when the total number of tree nodes is computationally tractable. For a 5-stage problem with 20 branchings per stage, the tree has million leaf nodes — each requiring an LP solve. Production-scale SDDP problems (60-120 stages) make full trees intractable; the mode is intended for:

  • Short-horizon problems (5-12 stages, weekly resolution)
  • DECOMP-like configurations with deterministic trunks and branching only at specific stages
  • Validation and benchmarking against SDDP solutions on small instances

The solver integration for complete tree mode (tree enumeration, node-to-subproblem mapping, result aggregation) is documented in Deferred Features SSC.12.

Cross-References