Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Output Schemas

Purpose

This spec defines the Parquet schemas for all output files produced by Cobre during simulation (policy evaluation) and training (policy construction). It covers the output directory layout, column-level schemas for every entity type, and the dictionary/metadata files that make outputs self-documenting.

For output infrastructure (manifests, MPI partitioning, configuration, scale reference, validation), see Output Infrastructure.

1. Directory Structure

output/
├── simulation/                              # Hive-partitioned simulation results
│   ├── _manifest.json                       # Checksums, row counts, partitions
│   ├── _SUCCESS                             # Marker on successful completion
│   ├── costs/
│   │   └── scenario_id=XXXX/data.parquet
│   ├── hydros/
│   │   └── scenario_id=XXXX/data.parquet
│   ├── thermals/
│   │   └── scenario_id=XXXX/data.parquet
│   ├── exchanges/
│   │   └── scenario_id=XXXX/data.parquet
│   ├── buses/
│   │   └── scenario_id=XXXX/data.parquet
│   ├── pumping_stations/                    # Optional: only if entity exists
│   │   └── scenario_id=XXXX/data.parquet
│   ├── contracts/                           # Optional: only if entity exists
│   │   └── scenario_id=XXXX/data.parquet
│   ├── non_controllables/                   # Optional: only if entity exists
│   │   └── scenario_id=XXXX/data.parquet
│   ├── batteries/                           # DEFERRED (forward-compatible placeholder)
│   │   └── scenario_id=XXXX/data.parquet
│   ├── inflow_lags/                         # Optional: only if AR order > 0
│   │   └── scenario_id=XXXX/data.parquet
│   └── violations/
│       └── generic/
│           └── scenario_id=XXXX/data.parquet
│
├── stochastic/                              # Stochastic model fitting artifacts
│   ├── noise_openings.parquet               # Noise term realizations per opening
│   ├── inflow_seasonal_stats.parquet        # Monthly inflow statistics
│   ├── inflow_ar_coefficients.parquet       # Autoregressive model coefficients
│   ├── correlation.json                     # Cross-correlation structure
│   ├── load_seasonal_stats.parquet          # Monthly load statistics
│   └── fitting_report.json                  # Stochastic model fitting diagnostics
│
├── hydro_models/                            # Computed FPHA hyperplanes (optional)
│   └── fpha_hyperplanes.parquet             # Written when any hydro uses computed FPHA
│
└── training/                                # Training phase outputs
    ├── _manifest.json
    ├── _SUCCESS
    ├── convergence.parquet                  # Iteration-level convergence
    ├── scaling_report.json                  # LP scaling diagnostics
    ├── model_provenance.json                # Model provenance metadata
    ├── timing/
    │   ├── iterations.parquet               # Per-iteration timing breakdown
    │   └── mpi_ranks.parquet                # Per-rank timing statistics
    ├── solver/
    │   ├── iterations.parquet               # Per-iteration solver statistics
    │   └── retry_histogram.parquet          # Per-iteration retry level histogram
    ├── cut_selection/
    │   └── iterations.parquet               # Per-iteration cut selection stats
    ├── dictionaries/
    │   ├── codes.json                       # Categorical code mappings
    │   ├── bounds.parquet                   # Entity bounds by stage/block
    │   ├── state_dictionary.json            # State space definition
    │   ├── variables.csv                    # Variable metadata
    │   └── entities.csv                     # Entity metadata
    └── metadata.json                        # Run configuration and system info

Optional files: Simulation entity directories are only written if the corresponding entities exist in the input model. The hydro_models/ directory is only written when at least one hydro uses the computed-source FPHA model.

2. Design Principles

2.1 Hive Partitioning by Scenario

Simulation outputs use Hive-style partitioning by scenario_id:

  • Parallel writes: Each MPI rank writes exclusively to its assigned scenario partitions
  • Partition pruning: Queries filtering by scenario read only relevant files
  • Incremental updates: Individual scenarios can be recomputed without rewriting all data

Partition naming: {entity_type}/scenario_id={scenario_id:04d}/data.parquet

The scenario_id column is NOT stored in Parquet data — it is derived from the partition path.

2.2 Categorical Encoding

All categorical columns use integer codes with mappings in dictionaries/codes.json:

  • Reduces storage (i8 vs variable-length strings)
  • Enables efficient filtering and grouping
  • Convention: categorical columns end with _code suffix (e.g., operative_state_code)

2.3 Constraint Violation Handling

Two mechanisms:

  1. Slack columns in entity files: Physical bound violations (e.g., turbined_slack_m3s) appear as dedicated columns, value 0 when no violation
  2. Generic violations file: User-defined generic constraint violations in violations/generic/

2.4 File Naming Conventions

ConventionExampleRationale
Plural entity nameshydros.parquetMultiple records
Lowercase with underscorespumping_stations/Filesystem-safe
data.parquet in partitionsscenario_id=0001/data.parquetStandard Hive convention

3. Categorical Codes (dictionaries/codes.json)

{
  "version": "1.0",
  "generated_at": "2026-01-18T12:00:00Z",
  "operative_state": {
    "0": "deactivated",
    "1": "maintenance",
    "2": "operating",
    "3": "saturated"
  },
  "storage_binding": {
    "0": "none",
    "1": "below_minimum",
    "2": "above_maximum",
    "3": "both"
  },
  "contract_type": {
    "0": "import",
    "1": "export"
  },
  "entity_type": {
    "0": "hydro",
    "1": "thermal",
    "2": "bus",
    "3": "line",
    "4": "pumping_station",
    "5": "contract",
    "7": "non_controllable"
  },
  "bound_type": {
    "0": "storage_min",
    "1": "storage_max",
    "2": "turbined_min",
    "3": "turbined_max",
    "4": "outflow_min",
    "5": "outflow_max",
    "6": "generation_min",
    "7": "generation_max",
    "8": "flow_min",
    "9": "flow_max"
  }
}

Forward-compatible codes: entity_type code 6 (battery) is reserved for future use. See Deferred Features for implementation timeline.

Usage in analysis:

import json, polars as pl

with open("training/dictionaries/codes.json") as f:
    codes = json.load(f)

df = pl.read_parquet("simulation/hydros/")
df = df.with_columns(
    pl.col("operative_state_code")
      .map_dict({int(k): v for k, v in codes["operative_state"].items()})
      .alias("operative_state")
)

4. Dictionary Files

4.1 Bounds Dictionary (dictionaries/bounds.parquet)

Centralizes all entity bounds by stage and block, eliminating redundant bound columns from entity output files.

ColumnTypeNullableDescription
entity_type_codei8NoEntity type code (see codes.json)
entity_idi32NoEntity identifier
stage_idi32NoStage index (0-based)
block_idi32YesBlock index (null = applies to all blocks)
bound_type_codei8NoBound type code (see codes.json)
bound_valuef64NoBound value in native units

Bounds are stored only when they differ from default/infinite values.

4.2 State Dictionary (dictionaries/state_dictionary.json)

Documents the state space structure for the SDDP policy. See Input Constraints §4 for full state schema.

4.3 Variables Metadata (dictionaries/variables.csv)

ColumnTypeDescription
filestringSource file (e.g., hydros, thermals)
columnstringColumn name
typestringData type (i8, i32, i64, f64, bool)
unitstringPhysical unit or null
descriptionstringHuman-readable description
nullableboolWhether null values are allowed

4.4 Entities Metadata (dictionaries/entities.csv)

ColumnTypeDescription
entity_type_codei8Entity type code
entity_idi32Entity identifier
namestringEntity name from input
bus_idi32Connected bus (if applicable)
system_idi32System/subsystem identifier

5. Simulation Output Schemas

5.1 Costs (simulation/costs/)

Stage and block-level cost breakdown for economic analysis. Cost columns are organized by the three penalty categories defined in Penalty System §2.

ColumnTypeNullableDescription
stage_idi32NoStage index (0-based)
block_idi32YesBlock index (null for stage-level aggregates)
total_costf64NoTotal stage cost (all components)
immediate_costf64NoStage immediate cost (excluding future cost)
future_costf64NoFuture cost function value ()
discount_factorf64NoCumulative discount factor (see Discount Rate §5)
Resource costs
thermal_costf64NoThermal generation cost
contract_costf64NoImport/export contract cost (net: imports positive, exports negative)
Category 1 — Recourse
deficit_costf64NoDeficit (unmet demand) penalty — piecewise segments
excess_costf64NoExcess generation penalty
Category 2 — Violations
storage_violation_costf64NoStorage below minimum violation cost
filling_target_costf64NoFilling target shortfall cost
hydro_violation_costf64NoSum of all hydro violation costs (aggregate of the 6 granular columns below)
outflow_violation_below_costf64NoOutflow below minimum violation cost (all hydros)
outflow_violation_above_costf64NoOutflow above maximum violation cost (all hydros)
turbined_violation_costf64NoTurbined flow below minimum violation cost (all hydros)
generation_violation_costf64NoGeneration below minimum violation cost (all hydros)
evaporation_violation_costf64NoEvaporation constraint violation cost (all hydros)
withdrawal_violation_costf64NoWater withdrawal violation cost (all hydros)
inflow_penalty_costf64NoInflow non-negativity penalty cost (see Inflow Non-Negativity)
generic_violation_costf64NoGeneric constraint violation penalties
Category 3 — Regularization
spillage_costf64NoSpillage regularization cost (all hydros)
fpha_turbined_costf64NoFPHA turbined flow regularization cost (FPHA hydros only)
curtailment_costf64NoNon-controllable source curtailment cost
exchange_costf64NoExchange regularization cost (all lines)
pumping_costf64NoImputed pumping cost (marginal price × energy consumed, not a direct LP cost term — see Penalty System §8)

Rows per scenario: num_stages × (1 + num_blocks) (stage-level + block-level rows)

Cost relationships:

total_cost = immediate_cost + discount_factor_applied_to_future * future_cost
immediate_cost = thermal_cost + contract_cost
               + deficit_cost + excess_cost
               + storage_violation_cost + filling_target_cost + hydro_violation_cost
               + inflow_penalty_cost + generic_violation_cost
               + spillage_cost + fpha_turbined_cost + curtailment_cost
               + exchange_cost + pumping_cost

hydro_violation_cost = outflow_violation_below_cost + outflow_violation_above_cost
                     + turbined_violation_cost + generation_violation_cost
                     + evaporation_violation_cost + withdrawal_violation_cost

Note on pumping_cost: Pumping stations have no explicit cost parameter in the LP. The pumping_cost column reports the imputed cost: marginal price at the connected bus × energy consumed. This is derived from dual variables after the solve, not from a direct LP cost term.

Note on storage_violation_cost and filling_target_cost: These penalties apply to end-of-stage storage (hm³) and appear outside the block summation in the LP objective. They are NOT per-block costs. In the costs output, they appear only in stage-level rows (where block_id is null).

5.2 Hydros (simulation/hydros/)

ColumnTypeNullableDescription
stage_idi32NoStage index (0-based)
block_idi32YesBlock index (null for stage-level)
hydro_idi32NoHydro plant identifier
turbined_m3sf64NoTurbined outflow (m³/s)
spillage_m3sf64NoSpillage (m³/s)
outflow_m3sf64NoTotal outflow: turbined + spillage (m³/s)
evaporation_m3sf64YesEvaporation loss (m³/s), null if not modeled
diverted_inflow_m3sf64YesInflow diverted from upstream
diverted_outflow_m3sf64YesOutflow diverted to downstream
incremental_inflow_m3sf64NoRealized incremental inflow (m³/s)
inflow_m3sf64NoTotal inflow including upstream
storage_initial_hm3f64NoStorage at start (hm³)
storage_final_hm3f64NoStorage at end (hm³)
generation_mwf64NoPower generation (MW)
generation_mwhf64NoEnergy generation (MWh)
productivity_mw_per_m3sf64YesEffective productivity
spillage_costf64NoSpillage regularization cost (this plant)
water_value_per_hm3f64NoMarginal value of stored water ($/hm³)
storage_binding_codei8NoStorage bound binding status
operative_state_codei8NoOperative state
Violation slacks
turbined_slack_m3sf64NoMin turbined violation (0 if none)
outflow_slack_below_m3sf64NoMin outflow violation (0 if none)
outflow_slack_above_m3sf64NoMax outflow violation (0 if none)
generation_slack_mwf64NoMin generation violation (0 if none)
storage_violation_below_hm3f64NoStorage below minimum violation (0 if none, stage-level only)
filling_target_violation_hm3f64NoFilling target shortfall (0 if none, filling hydros at terminal stage)
evaporation_violation_pos_m3sf64NoEvaporation over-estimate violation slack (0 if none)
evaporation_violation_neg_m3sf64NoEvaporation under-estimate violation slack (0 if none)
inflow_nonnegativity_slack_m3sf64NoInflow non-negativity slack (0 if none, stage-level)
water_withdrawal_violation_pos_m3sf64NoWater withdrawal over-withdrawal violation slack (0 if none)
water_withdrawal_violation_neg_m3sf64NoWater withdrawal under-withdrawal violation slack (0 if none)

Rows per scenario: num_stages × num_blocks × num_hydros

Water balance: storage_final = storage_initial + (inflow - outflow - evaporation + diverted_inflow - diverted_outflow) × duration

Slack interpretation: value > 0 means the corresponding constraint was relaxed. Storage and filling target slacks are stage-level (not per-block). See Penalty System §4 for the full violation catalogue.

5.3 Thermals (simulation/thermals/)

ColumnTypeNullableDescription
stage_idi32NoStage index (0-based)
block_idi32YesBlock index
thermal_idi32NoThermal unit identifier
generation_mwf64NoPower generation (MW)
generation_mwhf64NoEnergy generation (MWh)
generation_costf64NoGeneration cost
is_gnlboolNoWhether unit has GNL configuration
gnl_committed_mwf64YesGNL committed capacity (null if not GNL)
gnl_decision_mwf64YesGNL decision for future stages (null if not GNL)
operative_state_codei8NoOperative state

Rows per scenario: num_stages × num_blocks × num_thermals

GNL notes: gnl_committed_mw is capacity committed previously and available now; gnl_decision_mw is the decision made now for future availability. GNL decisions are state variables coupling stages.

5.4 Exchanges (simulation/exchanges/)

ColumnTypeNullableDescription
stage_idi32NoStage index (0-based)
block_idi32YesBlock index
line_idi32NoTransmission line identifier
direct_flow_mwf64NoDirect flow bus_from → bus_to (MW)
reverse_flow_mwf64NoReverse flow bus_to → bus_from (MW)
net_flow_mwf64NoNet flow: (MW), derived
net_flow_mwhf64NoNet energy flow (MWh)
losses_mwf64NoTransmission losses: (MW)
losses_mwhf64NoTransmission losses energy (MWh)
exchange_costf64NoExchange regularization cost (this line)
operative_state_codei8NoOperative state

Rows per scenario: num_stages × num_blocks × num_lines

Sign convention: net_flow_mw positive = bus_from → bus_to; negative = bus_to → bus_from.

Note: The LP decision variables are direct_flow_mw () and reverse_flow_mw (), both non-negative. net_flow_mw and losses are derived columns for analysis convenience. See System Elements §4 for the exchange model.

5.5 Buses (simulation/buses/)

ColumnTypeNullableDescription
stage_idi32NoStage index (0-based)
block_idi32YesBlock index
bus_idi32NoBus identifier
load_mwf64NoRealized load after curtailment (MW)
load_mwhf64NoRealized load energy (MWh)
deficit_mwf64NoUnmet demand (MW)
deficit_mwhf64NoUnmet demand energy (MWh)
excess_mwf64NoExcess generation (MW)
excess_mwhf64NoExcess generation energy (MWh)
spot_pricef64NoMarginal cost of energy ($/MWh)

Rows per scenario: num_stages × num_blocks × num_buses

Load balance: generation_total + imports − exports + deficit − excess = load

To compute generation by source, join with hydros, thermals, etc. using bus_id from entities.csv.

5.6 Pumping Stations (simulation/pumping_stations/) — Optional

ColumnTypeNullableDescription
stage_idi32NoStage index (0-based)
block_idi32YesBlock index
pumping_station_idi32NoPumping station identifier
pumped_flow_m3sf64NoPumped water flow (m³/s)
pumped_volume_hm3f64NoPumped volume (hm³)
power_consumption_mwf64NoPower consumed (MW)
energy_consumption_mwhf64NoEnergy consumed (MWh)
pumping_costf64NoImputed pumping cost
operative_state_codei8NoOperative state

Rows per scenario: num_stages × num_blocks × num_pumping_stations

Note on pumping_cost: Pumping stations have no explicit cost parameter in the LP objective. This column reports the imputed cost: marginal price at the connected bus (dual of load balance constraint, $/MWh) × energy consumed (MWh). It is computed after the solve from dual variables, not from a direct LP cost term. See Penalty System §8.

5.7 Contracts (simulation/contracts/) — Optional

ColumnTypeNullableDescription
stage_idi32NoStage index (0-based)
block_idi32YesBlock index
contract_idi32NoContract identifier
power_mwf64NoContracted power (MW)
energy_mwhf64NoContracted energy (MWh)
price_per_mwhf64NoContract price ($/MWh)
total_costf64NoTotal contract cost
operative_state_codei8NoOperative state

Rows per scenario: num_stages × num_blocks × num_contracts

5.8 Non-Controllable Sources (simulation/non_controllables/) — Optional

ColumnTypeNullableDescription
stage_idi32NoStage index (0-based)
block_idi32YesBlock index
non_controllable_idi32NoNon-controllable source identifier
generation_mwf64NoDispatched generation (MW)
generation_mwhf64NoDispatched generation (MWh)
available_mwf64NoAvailable generation from scenario (MW)
curtailment_mwf64NoCurtailed generation (MW)
curtailment_mwhf64NoCurtailed generation (MWh)
curtailment_costf64NoCurtailment regularization cost
operative_state_codei8NoOperative state

Rows per scenario: num_stages × num_blocks × num_non_controllable_sources

Non-controllable sources are fully defined in Input System Entities §7. Curtailment cost is a Category 3 regularization penalty — see Penalty System §2.

5.9 Batteries (simulation/batteries/) — DEFERRED

ColumnTypeNullableDescription
stage_idi32NoStage index (0-based)
block_idi32YesBlock index
battery_idi32NoBattery identifier
charge_mwf64NoCharging power (MW)
discharge_mwf64NoDischarging power (MW)
soc_initial_mwhf64NoState of charge at start (MWh)
soc_final_mwhf64NoState of charge at end (MWh)
cycle_costf64NoCycling degradation cost
operative_state_codei8NoOperative state

Forward-compatible placeholder: The schema and entity_type code 6 are reserved. Exceptional validation rejects battery entities at input loading until the implementation is ready. See Deferred Features for implementation timeline.

5.10 Inflow Lags (simulation/inflow_lags/) — Optional

ColumnTypeNullableDescription
stage_idi32NoStage index (0-based)
hydro_idi32NoHydro plant identifier
lag_indexi32NoLag index (1 = t−1, 2 = t−2, …)
inflow_m3sf64NoInflow value for this lag (m³/s)

Rows per scenario: num_stages × num_hydros × max_ar_order

lag_index uses 1-based indexing. Maximum lag index equals the AR model order. These values are state variables affecting inflow sampling.

5.11 Generic Violations (simulation/violations/generic/)

ColumnTypeNullableDescription
stage_idi32NoStage index (0-based)
block_idi32YesBlock index
constraint_idi32NoGeneric constraint identifier
slack_valuef64NoViolation amount (constraint units)
slack_costf64NoPenalty cost incurred

Rows per scenario: num_stages × num_blocks × num_generic_constraints (only non-zero violations may be stored).

6. Training Output Schemas

6.1 Convergence Log (training/convergence.parquet)

ColumnTypeNullableDescription
iterationi32NoIteration number (1-based)
lower_boundf64NoLower bound (expected cost-to-go from stage 0)
upper_bound_meanf64NoUpper bound mean
upper_bound_stdf64NoUpper bound standard deviation
gap_percentf64YesOptimality gap (null when not computable)
cuts_addedi32NoCuts added this iteration
cuts_removedi32NoCuts removed by cut selection
cuts_activei64NoTotal active cuts across all stages
time_forward_msi64NoForward pass wall time (ms)
time_backward_msi64NoBackward pass wall time (ms)
time_total_msi64NoTotal iteration wall time (ms)
forward_passesi32NoForward scenarios this iteration
lp_solvesi64NoTotal LP solves this iteration

Rows: num_iterations

Upper bound mechanisms: Two distinct mechanisms populate the upper bound columns:

  1. Simulation-based (Monte Carlo): Runs the SDDP policy on sampled scenarios and averages costs. Provides upper_bound_mean and upper_bound_std. Valid only for risk-neutral problems — see Stopping Rules.
  2. SIDP deterministic (inner approximation): Vertex-based upper bound via Lipschitz interpolation. Provides a deterministic upper_bound_mean (no std). Valid for both risk-neutral and risk-averse problems — see Upper Bound Evaluation.

Which mechanism is active depends on configuration. Both may run simultaneously (simulation-based for reporting, SIDP for convergence).

Gap computation: gap_percent = (upper_bound_mean − lower_bound) / |lower_bound| × 100. If UB evaluation is disabled, gap_percent is null. If only LB is available, the gap is computed from the LB change between iterations (see Stopping Rules §3).

Risk-averse interpretation: Under risk-averse (CVaR) settings, lower_bound is a convergence indicator, not a valid lower bound on the true risk-averse cost. It represents the first-stage objective value using the current cut approximation. For risk-averse convergence verification, the SIDP deterministic upper bound is required. gap_percent from the simulation-based UB is not meaningful for risk-averse problems. See Risk Measures §10.

6.2 Iteration Timing (training/timing/iterations.parquet)

ColumnTypeNullableDescription
iterationi32NoIteration number (1-based)
forward_solve_msi64NoLP solve time in forward pass
forward_sample_msi64NoScenario sampling time
backward_solve_msi64NoLP solve time in backward pass
backward_cut_msi64NoCut computation and storage time
cut_selection_msi64NoCut selection/pruning time
mpi_allreduce_msi64NoMPI AllReduce communication time
mpi_broadcast_msi64NoMPI Broadcast communication time
io_write_msi64NoOutput writing time
state_exchange_msi64NoState exchange time between passes
cut_batch_build_msi64NoCut batch construction time
rayon_overhead_msi64NoRayon thread pool coordination overhead
overhead_msi64NoUnaccounted overhead

Rows: num_iterations

6.3 MPI Rank Timing (training/timing/mpi_ranks.parquet)

ColumnTypeNullableDescription
iterationi32NoIteration number (1-based)
ranki32NoMPI rank (0-based)
forward_time_msi64NoForward pass time on this rank
backward_time_msi64NoBackward pass time on this rank
communication_time_msi64NoMPI communication time
idle_time_msi64NoTime waiting for other ranks
lp_solvesi64NoLP solves on this rank
scenarios_processedi32NoScenarios processed on this rank

Rows: num_iterations × num_mpi_ranks

Use idle_time_ms to identify load imbalance. Sum of scenarios_processed per iteration equals forward_passes. Communication patterns reveal MPI bottlenecks.

6.4 Solver Statistics (training/solver/iterations.parquet)

Per-iteration, per-phase, per-stage solver statistics for diagnosing LP conditioning and retry behavior. One row per (iteration, phase, stage) triple.

ColumnTypeNullableDescription
iterationu32NoIteration number
phasestringNoSDDP phase: "forward", "backward", "lower_bound", or "simulation"
stagei32NoStage index
lp_solvesu32NoTotal LP solves at this stage
lp_successesu32NoLP solves that succeeded on first attempt
lp_retriesu32NoLP solves that succeeded after retry
lp_failuresu32NoLP solves that failed after all retry attempts
retry_attemptsu32NoTotal retry attempts across all LP solves
basis_offeredu32NoLP solves where a warm-start basis was offered
basis_rejectionsu32NoLP solves where the offered basis was rejected
simplex_iterationsu64NoTotal simplex iterations across all LP solves
solve_time_msf64NoCumulative LP solve wall time (ms)
load_model_time_msf64NoCumulative model loading time (ms)
add_rows_time_msf64NoCumulative row addition time (ms)
set_bounds_time_msf64NoCumulative bound setting time (ms)
basis_set_time_msf64NoCumulative basis setting time (ms)

Rows: num_iterations × num_phases × num_stages

Phase values: The phase column identifies the SDDP algorithmic phase that produced the LP solves. "forward" and "backward" are the standard SDDP passes. "lower_bound" appears when a dedicated lower-bound evaluation solve is performed (separate from the forward pass). "simulation" appears when simulation-based upper bound evaluation is enabled (see Upper Bound Evaluation).

Retry-level histograms are stored separately in retry_histogram.parquet (SS6.6). See Solver Interface for retry strategy details.

6.5 Cut Selection Statistics (training/cut_selection/iterations.parquet)

Per-iteration, per-stage cut selection statistics. One row per (iteration, stage) pair.

ColumnTypeNullableDescription
iterationi32NoIteration number
stagei32NoStage index
cuts_populatedi32NoTotal cuts in the pool before selection
cuts_active_beforei32NoActive cuts before this iteration’s selection
cuts_deactivatedi32NoCuts deactivated by this iteration’s selection
cuts_active_afteri32NoActive cuts after this iteration’s selection
selection_time_msf64NoWall-clock time for cut selection at this stage (ms)

Rows: num_iterations × num_stages (only iterations where cut selection runs)

6.6 Retry Histogram (training/solver/retry_histogram.parquet)

Per-iteration, per-phase, per-stage histogram of retry escalation levels. One row per (iteration, phase, stage, retry_level) tuple where the count is non-zero.

ColumnTypeNullableDescription
iterationi32NoIteration number
phasestringNoSDDP phase: "forward", "backward", "lower_bound", or "simulation"
stagei32NoStage index
retry_levelu32NoRetry escalation level (0-based)
countu64NoNumber of LP solves that reached this retry level

Rows: sparse — only (iteration, phase, stage, retry_level) tuples with count > 0 are stored.

The retry ladder applies increasingly aggressive solver reconfiguration at each level (e.g., scaling, presolve toggling, algorithm switching). See Solver Interface for the full retry strategy. This file complements the aggregate retry statistics in iterations.parquet (SS6.4) by providing per-level granularity.

7. Structured Output vs Parquet Schemas

The Cobre output system produces two distinct categories of schemas with different purposes and access patterns:

CategoryFormatTransportPurposeAccess Method
Parquet schemas (§5-6)Apache ParquetDisk filesPost-hoc analysis, archival, large tabular dataArrow/Polars/Pandas libraries
Structured CLI outputJSON / JSON-linesstdoutReal-time monitoring, programmatic result consumptionStandard JSON parsing

Key distinction: Parquet schemas define the on-disk storage format for simulation results and training diagnostics. Structured CLI output defines the presentation format for CLI responses. They serve different audiences and use different data paths.

No JSON equivalents for simulation Parquet schemas: The 11 simulation entity schemas (§5.1-5.11) and 2 of the 3 training diagnostic schemas (§6.2-6.3) remain Parquet-only. Agents access them via Arrow libraries (available in Python, R, Julia, and Rust). Converting hundreds of MB of tabular data to JSON would be impractical and unnecessary.

Convergence log has a JSON equivalent: The convergence log schema (§6.1) has a JSON equivalent via the JSON-lines streaming protocol. The progress events emitted during training (see Convergence Monitoring §4.1) contain the same fields as the convergence.parquet columns. The Parquet file is the permanent record; JSON-lines is the ephemeral real-time stream.

MCP resource serialization: When the MCP server exposes Parquet data as resources (see MCP Server), it converts Parquet columns to JSON types using the following mapping:

Parquet/Arrow TypeJSON TypeNotes
Int32, Int64number
Float64numberNaN/Infinity serialized as null
Utf8string
Booleanboolean
TimestampstringISO 8601 format
Nullable columnsnullJSON null for missing values

This conversion is performed on demand by the MCP server’s resource handlers, not stored as separate JSON files.

8. Preprocessing Output Schemas

The following files are written during the Initialization phase (before training), not during simulation or training proper. They document the result of preprocessing steps so that users can inspect and, if needed, re-use computed artifacts.

8.1 Computed FPHA Hyperplanes (output/hydro_models/fpha_hyperplanes.parquet)

Written automatically when one or more hydros use the source: "computed" FPHA fitting path. The file uses the same 11-column schema as the input file system/fpha_hyperplanes.parquet and is round-trip compatible with the parser in cobre-io.

Implementation note (v0.1.4): This file is written by prepare_hydro_models in cobre-sddp on MPI rank 0 only, immediately after the fitting pipeline completes. It is not present when all FPHA hydros use source: "precomputed" or when no hydros use FPHA.

ColumnTypeNullableDescription
hydro_idi32NoHydro plant identifier
stage_idi32 | nullYesStage this plane set applies to (null = all stages)
plane_idi32NoPlane index within hydro (0-based)
gamma_0f64NoIntercept coefficient (MW)
gamma_vf64NoVolume coefficient (MW/hm³)
gamma_qf64NoTurbined flow coefficient (MW per m³/s)
gamma_sf64NoSpillage coefficient (MW per m³/s, ≤ 0)
kappaf64NoCorrection factor κ (worst-case approach, in (0, 1])
valid_v_min_hm3f64YesVolume validity minimum — null for all computed planes
valid_v_max_hm3f64YesVolume validity maximum — null for all computed planes
valid_q_max_m3sf64YesTurbined flow validity maximum — null for all computed planes

The validity range columns (valid_v_min_hm3, valid_v_max_hm3, valid_q_max_m3s) are always null for computed planes. They are populated only when hyperplanes are provided externally via system/fpha_hyperplanes.parquet and the supplier chooses to include them. See Input Hydro Extensions §3 for the complete schema description and validation rules.

Round-trip use: The exported file can be copied to system/fpha_hyperplanes.parquet in a subsequent run and referenced with source: "precomputed" to skip re-fitting. This is useful for large systems where fitting cost is non-trivial.

Cross-References