Output Infrastructure
Purpose
This spec defines the infrastructure layer for Cobre output: metadata files for reproducibility, _SUCCESS marker files for crash recovery, MPI-native Hive partitioning for parallel writes, and validation/integrity checks.
For output Parquet schemas (simulation and training column definitions), see Output Schemas.
For output configuration options within config.json, see Configuration Reference.
1. Completion Marker and Metadata Files
Run completion is tracked by two separate mechanisms: a metadata.json file containing run metadata (timing, configuration snapshot, problem dimensions), and a _SUCCESS marker file whose presence indicates successful completion. The old _manifest.json pattern with its status field has been removed.
1.1 Completion Marker (_SUCCESS)
Each output phase writes a zero-byte _SUCCESS marker file upon successful completion:
training/_SUCCESS— written by rank 0 after all training outputs (Parquet files, metadata, policy checkpoint) are flushed.simulation/_SUCCESS— written by rank 0 after the simulation manifest/metadata and all simulation Parquet partitions are confirmed complete.
Crash Recovery Protocol:
- On startup, check whether
_SUCCESSexists in the relevant output directory. - If absent, the previous run did not complete successfully. Examine existing partition directories to identify completed work.
- Resume from incomplete scenarios/iterations.
- Write
_SUCCESSonly after all output files are confirmed flushed.
The _SUCCESS marker is the last file written in each output phase. Its presence is both necessary and sufficient to consider the output directory complete.
1.2 Simulation Metadata (simulation/metadata.json)
{
"$schema": "https://cobre.dev/schemas/v2/simulation_metadata.schema.json",
"version": "2.0.0",
"started_at": "2026-01-17T10:00:00Z",
"completed_at": "2026-01-17T10:15:00Z",
"scenarios": {
"total": 2000,
"completed": 2000
},
"partitions_written": ["scenario_id=0/", "scenario_id=1/", "..."],
"checksum": {
"algorithm": "xxhash64",
"value": "a1b2c3d4e5f6"
},
"mpi_info": {
"world_size": 128,
"ranks_participated": 128
}
}
| Field | Type | Description |
|---|---|---|
started_at | string | ISO 8601 timestamp |
completed_at | string | ISO 8601 timestamp |
scenarios.total | i32 | Total scenarios to simulate |
scenarios.completed | i32 | Successfully completed scenarios |
partitions_written | array | List of Hive partition directories written |
checksum | object | Integrity checksum for validation |
mpi_info.world_size | i32 | Number of MPI ranks |
mpi_info.ranks_participated | i32 | Ranks that wrote data |
1.3 Training Metadata (training/metadata.json)
Training metadata captures convergence outcome, iteration counts, and cut statistics. The detailed run metadata (timing, configuration snapshot, problem dimensions) is documented in SS2.
{
"$schema": "https://cobre.dev/schemas/v2/training_metadata.schema.json",
"version": "2.0.0",
"started_at": "2026-01-17T08:00:00Z",
"completed_at": "2026-01-17T12:30:00Z",
"iterations": {
"max_iterations": 100,
"completed": 100,
"converged_at": 87
},
"convergence": {
"achieved": true,
"final_gap_percent": 0.45,
"termination_reason": "simulation"
},
"cuts": {
"total_generated": 1250000,
"total_active": 980000,
"peak_active": 1100000
},
"checksum": {
"algorithm": "xxhash64",
"policy_value": "f1e2d3c4b5a6",
"convergence_value": "1a2b3c4d5e6f"
},
"mpi_info": {
"world_size": 128,
"forward_passes_per_iteration": 8
}
}
| Field | Type | Description |
|---|---|---|
iterations.max_iterations | i32 | Maximum iterations from iteration_limit stopping rule |
iterations.completed | i32 | Iterations actually run |
iterations.converged_at | i32 | Iteration where convergence-oriented rule triggered (null if terminated by safety limit) |
convergence.achieved | bool | Whether a convergence-oriented rule (bound_stalling or simulation) triggered, as opposed to a safety limit (iteration_limit, time_limit) |
convergence.final_gap_percent | f64 | Final optimality gap (null if upper bound evaluation is disabled). Under CVaR risk measures, this gap is not a valid optimality bound; see Risk Measures §10 |
convergence.termination_reason | string | One of: "iteration_limit", "time_limit", "bound_stalling", "simulation". See Stopping Rules |
cuts.total_generated | i64 | Total cuts generated during training |
cuts.total_active | i64 | Active cuts at termination |
cuts.peak_active | i64 | Peak active cuts during training |
1.4 CLI Report Access
Metadata files are accessible via the report subcommand, which reads them from disk and returns structured JSON. This enables agents and scripts to inspect training status, convergence outcome, and simulation progress without parsing the file contents directly.
# Query training metadata
cobre report /path/to/output --output-format json --section convergence
# Query simulation metadata
cobre report /path/to/output --output-format json --section simulation
The report subcommand is a read-only operation that does not require MPI. It reads the metadata files documented in SS1.2, SS1.3, and SS2, wraps them in the CLI response envelope (see CLI and Lifecycle §8 and Structured Output §4), and emits the result to stdout. The MCP tool cobre/query-convergence performs the same operation via the MCP protocol (see MCP Server).
2. Metadata File (training/metadata.json)
Comprehensive metadata for reproducibility, audit trails, and debugging.
{
"$schema": "https://cobre.dev/schemas/v2/training_metadata.schema.json",
"version": "2.0.0",
"run_info": {
"run_id": "uuid-v4-here",
"started_at": "2026-01-17T08:00:00Z",
"completed_at": "2026-01-17T12:30:00Z",
"duration_seconds": 16200,
"cobre_version": "2.0.0",
"solver": "highs",
"solver_version": "1.7.2",
"hostname": "compute-node-001",
"user": "scheduler"
},
"configuration_snapshot": {
"seed": 42,
"forward_passes": 192,
"stopping_rules": [
{ "type": "iteration_limit", "limit": 100 },
{
"type": "simulation",
"replications": 100,
"period": 20,
"bound_window": 5,
"distance_tol": 0.01,
"bound_tol": 0.0001
}
],
"stopping_mode": "any",
"cut_selection": {
"enabled": true,
"method": "level1"
},
"upper_bound_evaluation": {
"enabled": true,
"initial_iteration": 10,
"interval_iterations": 5
},
"policy_mode": "fresh"
},
"problem_dimensions": {
"num_stages": 12,
"num_scenarios": 2000,
"num_openings": 50,
"num_plants": 360,
"num_buses": 5
}
}
Notes on configuration_snapshot:
- This is an informational record of the training configuration, not a normative schema. The canonical config schema is defined in Configuration Reference.
stopping_rulesis recorded verbatim fromconfig.jsonso that the termination behavior can be reconstructed from the output alone.cut_selection.methoduses the values from Cut Management §9:"level1","lml1", or"domination".upper_bound_evaluationmirrors the config section from Input Directory Structure §2. Vertex-based (SIDP) upper bounds are enabled when theupper_bound_evaluationsection is present withenabled: true; see Upper Bound Evaluation.
Notes on problem_dimensions:
- The 5 fields (
num_stages,num_scenarios,num_openings,num_plants,num_buses) reflect the actual code.num_plantsis the combined count of hydro and thermal plants. - Additional dimension fields (e.g.,
num_blocks_per_stage,num_lines,state_dimension,lp_dimensions) are planned but not yet implemented in the code.
3. MPI Direct Hive Partitioning
Each MPI rank writes directly to Hive partition directories without coordination. For Hive partitioning design principles, see Output Schemas §2.1.
3.1 Directory Layout
simulation/
├── costs/
│ ├── scenario_id=0/data.parquet # Written by rank 0
│ ├── scenario_id=1/data.parquet # Written by rank 0
│ ├── scenario_id=2/data.parquet # Written by rank 1
│ └── ...
├── hydros/
│ ├── scenario_id=0/data.parquet
│ └── ...
├── metadata.json # Written by rank 0 only
└── _SUCCESS # Written by rank 0 after all partitions confirmed
3.2 Write Semantics
Scenario assignment: Round-robin — rank = scenario_id % world_size. Each rank writes only its assigned scenarios.
Write protocol:
- Each rank writes its assigned partitions independently (embarrassingly parallel — no inter-rank coordination during writes).
- All ranks synchronize at a barrier after writes complete.
- Rank 0 writes the metadata file and
_SUCCESSmarker after the barrier.
Atomic write pattern: Each file is written to a temporary path (data.parquet.tmp), flushed to disk, then atomically renamed to data.parquet. This prevents partial files from appearing as valid output.
Note — Intra-rank thread parallelism (pending HPC specs): The write protocol above describes rank-level granularity. However, Cobre uses hybrid MPI+OpenMP parallelism where multiple threads within each rank may independently process scenarios. If all thread-owned scenarios funnel through a single rank-level writer, this becomes a serialization bottleneck at scale. The actual write responsibility assignment (per-rank vs per-thread) and synchronization strategy will be defined in the HPC work distribution specs. The invariants that must be preserved regardless of the final design are: (1) each partition is written by exactly one writer, (2) writes use the atomic temp-file-then-rename pattern, and (3) the manifest is written only after all partitions are confirmed complete.
3.3 Failure Handling
| Failure Type | Detection | Recovery |
|---|---|---|
| Rank crash mid-write | Missing _SUCCESS marker | Re-run failed scenarios only |
| Partial file write | Parquet read failure | Delete and re-write partition |
| Metadata corruption | JSON parse error | Rebuild from partition listing |
| Disk full | Write error | Alert, do not corrupt existing data |
4. Output Size Estimates
Reference output sizes for production-scale SDDP runs. For problem dimension profiles (Small through Extra Large), see Production Scale Reference.
| Output | Small | Medium | Large | Extra Large |
|---|---|---|---|---|
simulation/costs/ | 50 MB | 800 MB | 4 GB | 20 GB |
simulation/hydros/ | 200 MB | 5 GB | 30 GB | 150 GB |
simulation/thermals/ | 150 MB | 4 GB | 25 GB | 120 GB |
training/convergence.parquet | 10 KB | 50 KB | 100 KB | 250 KB |
training/timing/ | 1 MB | 15 MB | 120 MB | 1.2 GB |
policy/ (cuts) | 500 MB | 8 GB | 40 GB | 200 GB |
| Total | ~1 GB | ~20 GB | ~100 GB | ~500 GB |
Storage recommendations:
- Use SSD/NVMe for training (frequent random writes)
- Network filesystem acceptable for simulation (sequential writes)
- Consider parallel filesystem (Lustre, GPFS) for >100 GB outputs
- Enable compression for network transfers
4.1 I/O Bandwidth Requirements
| Scale | Write Throughput | Duration | Bottleneck |
|---|---|---|---|
| Small | 50 MB/s | 20s | None |
| Medium | 200 MB/s | 100s | Network |
| Large | 500 MB/s | 200s | Filesystem |
| Extra Large | 1+ GB/s | 500s | Parallel FS |
5. Validation and Integrity
5.1 Schema Validation
Each output entity must conform to the Parquet schema defined in Output Schemas. Validation verifies column names, types, and nullability against the schema definitions.
5.2 Data Integrity Checks
| Check | Method | Frequency |
|---|---|---|
| Parquet file integrity | Footer checksum | On read |
| Partition completeness | _SUCCESS marker + metadata | Post-run |
| Row count consistency | Cross-entity validation | Post-run |
| Value range validation | Min/max from bounds.parquet | Optional |
Cross-entity validation: For each scenario, the number of rows in every entity output must be consistent with the stage and block counts for that scenario. For example, costs/ has one row per (stage, block), while hydros/ has one row per (stage, block, hydro). Missing or extra rows indicate a write failure.
5.3 Reproducibility Verification
Reproducibility can be verified by comparing output artifacts across runs:
- Given the same inputs, configuration, and random seed, the system must produce identical policy and convergence outputs.
- Two runs can be compared by computing checksums over policy files (
policy/cuts/stage_*.bin) andtraining/convergence.parquet. - If input data differs between runs, the policy outputs are not directly comparable.
6. Output Writer API
This section defines the Rust types and function signatures for writing all Cobre output: simulation Parquet files, training Parquet files, manifest files, metadata files, dictionary files, and policy checkpoints. These types live in cobre-io and are consumed by cobre-sddp (training loop and simulation phase) and cobre-cli (orchestration layer).
The API follows the same design pattern as the input loading API (Input Loading Pipeline SS8.1): a top-level anchoring function, concrete writer types (not traits), and a dedicated error enum.
Design decisions:
- Separate concrete writers, not a single trait. The four output chains (simulation Parquet, training Parquet, manifests/metadata, policy checkpoint) have different formats, lifecycles, and thread-safety requirements. A unified
OutputWritertrait would impose artificial uniformity. - Parquet library:
arrow-rs. Thearrow-rsecosystem (arrow,parquetcrates) is the Rust standard for Apache Arrow and Parquet I/O, with active maintenance and broad ecosystem support. - Synchronous API. Writer methods are blocking. Async decoupling is provided at the architecture level by the bounded channel between simulation threads and the background I/O thread (Simulation Architecture SS6.1).
- No column definitions here. Parquet column schemas are defined in Output Schemas SS5–6. The writers reference those schemas but do not duplicate them.
- serde derives. Manifest structs (§1.1, §1.2) and the metadata struct (§2) derive
serde::Serializefor JSON serialization. Parquet writers use ArrowRecordBatcharrays directly and do not require serde on per-row structs. FlatBuffers uses generated code from the.fbsschema (Binary Formats SS3.1).
6.1 write_results Anchoring Function
write_results is the top-level entry point from cobre-cli (rank 0) into cobre-io for writing all output artifacts after training and optional simulation complete. It orchestrates the individual writers defined in §6.2–§6.7.
Function signature:
#![allow(unused)]
fn main() {
/// Write all output artifacts for a completed Cobre run.
///
/// This is the primary entry point from cobre-cli (rank 0) into cobre-io
/// for output writing. It orchestrates writing of training results,
/// simulation results (when present), manifests, metadata, and
/// dictionary files.
///
/// # Parameters
///
/// - `output_dir` -- Root output directory. Training outputs are written
/// to `output_dir/training/` and simulation outputs to
/// `output_dir/simulation/`. The directory is created if it does not
/// exist.
///
/// - `training_output` -- Training results: convergence log, per-iteration
/// timing, per-rank timing. Always present (training always runs).
///
/// - `simulation_output` -- Simulation results. `None` when simulation is
/// disabled (`simulation.enabled = false` in config). When `Some`, the
/// simulation Parquet files have already been written by the streaming
/// I/O thread (§6.2); this function writes only the simulation manifest.
///
/// - `system` -- Shared reference to the loaded system. Used for
/// dictionary generation (entity names, IDs, bounds).
///
/// - `config` -- Run configuration. Used for metadata snapshot and
/// Parquet writer configuration (compression, row group size).
///
/// # Errors
///
/// Returns `OutputError` if any output file cannot be written.
/// Partial writes may leave some output files on disk; the manifest
/// `status` field will remain `"running"` (not `"complete"`),
/// enabling crash recovery on re-run (§1.1).
///
/// # Execution context
///
/// Called on rank 0 only, after the MPI barrier that confirms all
/// ranks have completed their partition writes. See
/// [Output Infrastructure §3.2](output-infrastructure.md) for the
/// write protocol.
pub fn write_results(
output_dir: &Path,
training_output: &TrainingOutput,
simulation_output: Option<&SimulationOutput>,
system: &System,
config: &Config,
ctx: &OutputContext,
) -> Result<(), OutputError>
}
write_results performs the following steps, in order:
- Create
output_dir/training/andoutput_dir/simulation/directories if they do not exist. - Write dictionary files via
write_dictionaries(§6.5). - Write training Parquet files via
TrainingParquetWriter(§6.3). - Write training metadata via
write_training_metadata(§6.5). - If
simulation_outputisSome, write simulation metadata viawrite_simulation_metadata(§6.5). - Write
_SUCCESSmarker files.
write_results does NOT write simulation Parquet files. Those are written by the streaming I/O thread during simulation execution, using the SimulationParquetWriter (§6.2). By the time write_results is called, the simulation Parquet files are already on disk. write_results writes only the simulation manifest (which requires the final scenario counts and checksums).
Input types:
TrainingOutput and SimulationOutput are aggregate types defined in cobre-sddp that carry all data needed for output writing. Their exact field definitions are determined by the training and simulation return types (Training Loop SS2.1, Simulation Architecture SS3.4.4). The key fields consumed by write_results are:
| Type | Key Fields |
|---|---|
TrainingOutput | convergence log records, per-iteration timing, per-rank timing, cut statistics |
SimulationOutput | scenario count, completion status, per-partition checksums, cost statistics |
6.2 SimulationParquetWriter
SimulationParquetWriter is the concrete writer for simulation Parquet files. It is used by the background I/O thread that receives SimulationScenarioResult values through the bounded channel (Simulation Architecture SS6.1).
#![allow(unused)]
fn main() {
/// Writer for simulation Parquet files. Receives per-scenario results
/// and writes them to Hive-partitioned Parquet files under
/// `output_dir/simulation/`.
///
/// # Thread safety
///
/// `SimulationParquetWriter` implements `Send` because it is created
/// on the main thread and moved to the dedicated background I/O
/// thread. It does NOT implement `Sync` -- only one thread (the I/O
/// thread) accesses it at a time.
///
/// # Lifecycle
///
/// 1. Created via `new()` before the simulation phase begins.
/// 2. `write_scenario()` called once per completed scenario, in
/// arrival order (not necessarily scenario ID order).
/// 3. `finalize()` called after the channel is closed (all senders
/// dropped), flushing any buffered data and computing checksums.
pub struct SimulationParquetWriter { /* ... */ }
impl SimulationParquetWriter {
/// Create a new simulation Parquet writer.
///
/// # Parameters
///
/// - `output_dir` -- Root output directory. Parquet files are
/// written under `output_dir/simulation/{entity}/scenario_id=XXXX/`.
///
/// - `system` -- Shared reference to the system for entity
/// metadata (entity counts, block counts per stage, line loss
/// factors, block durations). Used to compute derived columns
/// (energy = power x duration, losses, net flows) during
/// Parquet writing. See [Simulation Architecture SS3.4](../architecture/simulation-architecture.md)
/// for the list of excluded (derived) columns.
///
/// - `config` -- Parquet writer configuration: compression codec
/// (Zstd level 3), row group size (~100,000 rows), dictionary
/// encoding for categorical columns. See [Binary Formats SS5](binary-formats.md).
///
/// # Errors
///
/// Returns `OutputError::IoError` if the output directory cannot
/// be created.
pub fn new(
output_dir: &Path,
system: &System,
config: &ParquetWriterConfig,
) -> Result<Self, OutputError>
/// Write one scenario's simulation results to Parquet files.
///
/// Each call writes one Hive partition per entity type:
/// `{entity}/scenario_id={id:04d}/data.parquet`. Files are written
/// atomically (write to `.tmp`, then rename) per the protocol in §3.2.
///
/// The writer converts the nested per-entity-type layout of
/// `SimulationScenarioResult` into the columnar Arrow `RecordBatch`
/// format, computing derived columns (MWh energy, net flow, losses)
/// from system metadata. Column schemas are defined in
/// [Output Schemas SS5.1--5.11](output-schemas.md).
///
/// # Parameters
///
/// - `result` -- Complete simulation result for one scenario, as
/// produced by the simulation forward pass. See
/// [Simulation Architecture SS3.4.3](../architecture/simulation-architecture.md).
///
/// # Errors
///
/// Returns `OutputError::IoError` on disk write failure or
/// `OutputError::SerializationError` on Arrow/Parquet encoding failure.
pub fn write_scenario(
&mut self,
result: SimulationScenarioResult,
) -> Result<(), OutputError>
/// Finalize the writer: flush any buffered data, compute checksums
/// over all written partitions, and return the manifest data.
///
/// This is a consuming method -- the writer cannot be used after
/// finalization. The returned `SimulationManifest` contains the
/// scenario counts, partition list, and checksums needed by
/// `write_simulation_metadata` (§6.5).
///
/// # Errors
///
/// Returns `OutputError::IoError` if final flush or checksum
/// computation fails.
pub fn finalize(self) -> Result<SimulationManifest, OutputError>
}
}
Concurrency model: The SimulationParquetWriter runs on a single dedicated I/O thread per MPI rank. Multiple simulation threads send SimulationScenarioResult values through the bounded channel; the I/O thread receives and writes them sequentially. There is no lock contention on the writer itself. Backpressure from the bounded channel (capacity configured via simulation.io_channel_capacity, default 64) throttles simulation threads when I/O falls behind.
6.3 TrainingParquetWriter
TrainingParquetWriter writes the three training Parquet files: convergence log, iteration timing, and MPI rank timing.
#![allow(unused)]
fn main() {
/// Writer for training Parquet files. Writes convergence log,
/// iteration timing, and MPI rank timing under
/// `output_dir/training/`.
///
/// # Thread safety
///
/// `TrainingParquetWriter` runs on the main thread (rank 0) after
/// training completes. It does not need `Send` or `Sync`.
///
/// # Lifecycle
///
/// 1. Created via `new()` after training completes.
/// 2. `write_iteration()` called once per training iteration.
/// 3. `write_rank_timing()` called once with all rank timing records.
/// 4. `finalize()` called to flush and close all files.
pub struct TrainingParquetWriter { /* ... */ }
impl TrainingParquetWriter {
/// Create a new training Parquet writer.
///
/// # Parameters
///
/// - `output_dir` -- Root output directory. Files are written to
/// `output_dir/training/convergence.parquet`,
/// `output_dir/training/timing/iterations.parquet`, and
/// `output_dir/training/timing/mpi_ranks.parquet`.
///
/// # Errors
///
/// Returns `OutputError::IoError` if the output directories
/// cannot be created.
pub fn new(output_dir: &Path) -> Result<Self, OutputError>
/// Write one iteration's convergence and timing data.
///
/// Appends one row to `convergence.parquet` and one row to
/// `timing/iterations.parquet`. Column schemas are defined in
/// [Output Schemas SS6.1](output-schemas.md) and
/// [Output Schemas SS6.2](output-schemas.md).
///
/// # Parameters
///
/// - `record` -- Convergence and timing data for one iteration.
/// Contains all fields from the convergence log schema (SS6.1)
/// and iteration timing schema (SS6.2).
///
/// # Errors
///
/// Returns `OutputError::SerializationError` on Arrow encoding
/// failure.
pub fn write_iteration(
&mut self,
record: &IterationRecord,
) -> Result<(), OutputError>
/// Write MPI rank timing records for all iterations.
///
/// Writes all rows to `timing/mpi_ranks.parquet`. Column schema
/// is defined in [Output Schemas SS6.3](output-schemas.md).
///
/// # Parameters
///
/// - `records` -- Rank timing records, one per (iteration, rank)
/// pair.
///
/// # Errors
///
/// Returns `OutputError::SerializationError` on Arrow encoding
/// failure.
pub fn write_rank_timing(
&mut self,
records: &[RankTimingRecord],
) -> Result<(), OutputError>
/// Finalize the writer: flush all buffered data and close files.
///
/// # Errors
///
/// Returns `OutputError::IoError` if final flush fails.
pub fn finalize(self) -> Result<(), OutputError>
}
}
6.4 OutputError
OutputError is the error type for all output writing operations. It mirrors the structure of LoadError (Input Loading Pipeline SS8.1) with variants ordered by the phase in which they typically occur.
#![allow(unused)]
fn main() {
/// Errors that can occur during output writing.
#[derive(Debug, thiserror::Error)]
pub enum OutputError {
/// Filesystem write failure (permission denied, disk full, rename
/// failure during atomic write).
#[error("I/O error writing {path}: {source}")]
IoError {
path: PathBuf,
source: std::io::Error,
},
/// Arrow or Parquet encoding failure (schema mismatch between
/// constructed RecordBatch and expected schema, unsupported type
/// conversion).
#[error("serialization error for {entity}: {message}")]
SerializationError {
entity: String,
message: String,
},
/// Parquet schema validation failure (column count mismatch,
/// unexpected null in non-nullable column, data type mismatch
/// against the schemas defined in Output Schemas SS5--6).
#[error("schema error in {file}: column {column}: {message}")]
SchemaError {
file: String,
column: String,
message: String,
},
/// Manifest construction or serialization failure (missing
/// required field, JSON serialization error, checksum computation
/// failure).
#[error("manifest error for {manifest_type}: {message}")]
ManifestError {
manifest_type: String,
message: String,
},
}
}
| Variant | Typical Trigger | Example |
|---|---|---|
IoError | Filesystem operations (create dir, write file, atomic rename) | Disk full writing hydros/scenario_id=0042/data.parquet |
SerializationError | Arrow RecordBatch construction or Parquet row group encoding | Float-to-int conversion failure in Arrow array builder |
SchemaError | Column count or type mismatch during Parquet write | Expected 24 columns in hydros schema, RecordBatch has 23 |
ManifestError | Manifest JSON serialization or checksum computation | xxhash64 checksum computation failed for simulation partitions |
6.5 Manifest, Metadata, and Dictionary Writers
These are standalone functions (not methods on a struct) because each writes a single file atomically.
#![allow(unused)]
fn main() {
/// Write the simulation metadata to `output_dir/simulation/metadata.json`.
///
/// The metadata schema is defined in §1.2. The metadata value is
/// produced by the simulation runner after all scenarios complete.
///
/// # Errors
///
/// Returns `OutputError::IoError` on write failure or
/// `OutputError::SerializationError` on JSON serialization failure.
pub fn write_simulation_metadata(
path: &Path,
metadata: &SimulationMetadata,
) -> Result<(), OutputError>
/// Write the training metadata to `output_dir/training/metadata.json`.
///
/// The metadata schema is defined in §2. The metadata struct
/// captures the run configuration snapshot, problem dimensions,
/// performance summary, data integrity hashes, and environment
/// information.
///
/// # Errors
///
/// Returns `OutputError::IoError` on write failure or
/// `OutputError::ManifestError` on JSON serialization failure.
pub fn write_training_metadata(
path: &Path,
metadata: &TrainingMetadata,
) -> Result<(), OutputError>
/// Write all dictionary files to `output_dir/training/dictionaries/`.
///
/// Produces the following files:
/// - `codes.json` -- categorical code mappings (§3)
/// - `bounds.parquet` -- entity bounds by stage/block (§4.1)
/// - `state_dictionary.json` -- state space definition (§4.2)
/// - `variables.csv` -- variable metadata (§4.3)
/// - `entities.csv` -- entity metadata (§4.4)
///
/// Dictionary schemas are defined in [Output Schemas SS3--4](output-schemas.md).
///
/// # Parameters
///
/// - `path` -- Dictionary directory path
/// (`output_dir/training/dictionaries/`).
/// - `system` -- System reference for entity names, IDs, bus
/// assignments, and bounds.
/// - `config` -- Configuration for stage/block structure and
/// state variable definitions.
///
/// # Errors
///
/// Returns `OutputError::IoError` on write failure or
/// `OutputError::SerializationError` on encoding failure.
pub fn write_dictionaries(
path: &Path,
system: &System,
config: &Config,
) -> Result<(), OutputError>
}
serde derives for manifest and metadata types: The SimulationManifest, TrainingManifest, and TrainingMetadata structs derive serde::Serialize (and serde::Deserialize for manifest types, to support crash recovery reads). These structs map directly to the JSON schemas in §1.1, §1.2, and §2 respectively. The serde field names match the JSON field names using #[serde(rename_all = "snake_case")] (which is already the naming convention in the JSON schemas).
6.6 Policy Checkpoint Writer
#![allow(unused)]
fn main() {
/// Write a policy checkpoint to the policy directory.
///
/// Serializes the current cut pool, visited states, and solver basis
/// cache to FlatBuffers `.bin` files under `output_dir/policy/`.
/// The FlatBuffers schema is defined in [Binary Formats SS3.1](binary-formats.md).
/// The directory structure follows [Binary Formats SS3.2](binary-formats.md).
///
/// # Parameters
///
/// - `path` -- Policy directory path (`output_dir/policy/`).
/// - `stage_cuts` -- Per-stage cut collections. Each element is
/// serialized to `cuts/stage_{NNN}.bin`.
/// - `stage_bases` -- Per-stage cached solver bases. Each element is
/// serialized to `basis/stage_{NNN}.bin`.
/// - `metadata` -- Policy metadata (version, iteration count, bounds,
/// RNG state). Serialized to `metadata.json` (JSON, not FlatBuffers,
/// for human readability).
///
/// # Errors
///
/// Returns `OutputError::IoError` on write failure or
/// `OutputError::SerializationError` on FlatBuffers encoding failure.
pub fn write_policy_checkpoint(
path: &Path,
stage_cuts: &[StageCuts],
stage_bases: &[StageBasis],
metadata: &PolicyMetadata,
) -> Result<(), OutputError>
}
6.7 API Element Summary
The following table maps the 9 API elements identified in report-013 section 4.4 to their definitions in this section:
| # | API Element | Definition | Format |
|---|---|---|---|
| 1 | Simulation Parquet writer type | §6.2 | Arrow RecordBatch + Parquet |
| 2 | Training Parquet writer type | §6.3 | Arrow RecordBatch + Parquet |
| 3 | Manifest writer function | §6.5 | JSON via serde |
| 4 | Metadata writer function | §6.5 | JSON via serde |
| 5 | Dictionary writer functions | §6.5 | JSON + Parquet + CSV |
| 6 | FlatBuffers serialization function | §6.6 | FlatBuffers generated code |
| 7 | Output error type (OutputError) | §6.4 | thiserror enum |
| 8 | serde derives on output types | §6.5 (note) | serde::Serialize |
| 9 | Parquet library selection | §6 (intro) | arrow-rs ecosystem |
Cross-References
- Output Schemas — Parquet column definitions for all entity types and Hive partitioning design
- Binary Formats — Policy file format (FlatBuffers
.binfiles) and directory structure - Input System Entities — Entity registries (entity IDs, names)
- Penalty System — Penalty costs affecting output values
- Configuration Reference — Full
config.jsonreference including output configuration - Input Directory Structure —
config.jsonstructure andupper_bound_evaluationsection - Production Scale Reference — Problem dimension profiles
- Stopping Rules — Termination criteria definitions
- Cut Management — Cut selection strategy names and parameters
- Risk Measures — CVaR and lower bound validity
- Upper Bound Evaluation — Simulation-based and SIDP upper bound mechanisms
- Block Formulations — Block mode definitions (per-stage)
- Design Principles — Overall design philosophy
- Input Loading Pipeline —
load_case/LoadErrorpattern mirrored bywrite_results/OutputError(SS8.1) - Simulation Architecture —
SimulationScenarioResulttype (SS3.4.3), bounded channel streaming (SS6.1),fn simulate()signature (SS3.4.6) - Training Loop — Training iteration lifecycle and event emission (SS2.1)