Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

PAR(p) Inflow Model

Purpose

This spec defines the Periodic Autoregressive model of order (PAR(p)) used to capture temporal correlation in inflow time series, including the model definition, parameter semantics, the relationship between stored and computed quantities, the fitting procedure, model order selection, and validation invariants.

1. Model Definition

The Periodic Autoregressive model of order p (PAR(p)) captures temporal correlation in inflow time series while accounting for seasonal variation in parameters. For hydro at stage corresponding to season :

where:

  • : Incremental inflow at stage (m³/s)
  • : Seasonal mean for season
  • : Autoregressive coefficient for lag in season
  • : Residual standard deviation for season (computed at runtime — see §3)
  • : Innovation (standardized noise)
  • : Season index for stage (e.g., month 1–12)

The model order can vary by season and by hydro plant.

2. Parameter Set

For each hydro and each season (e.g., for monthly, for weekly), the complete PAR(p) model requires:

ParameterSymbolDescription
Seasonal meanMean inflow for season
AR coefficientsAutoregressive coefficients
Residual standard deviationScale of innovation term

3. Stored vs. Computed Quantities

The data model stores seasonal sample statistics and standardized AR coefficients with an explicit residual fraction. The relationship between stored and computed quantities is: PAR model stored vs computed quantities — files on disk store scale-invariant ψ* and residual_std_ratio, runtime converts to original-unit ψ and σ using seasonal stats

Stored in input files

These are provided in inflow_seasonal_stats.parquet and inflow_ar_coefficients.parquet (see Input Scenarios §3.1–3.2):

Stored quantityColumnFileSymbolDescription
Seasonal sample meanmean_m3sinflow_seasonal_statsMean of historical observations for season
Seasonal sample stdstd_m3sinflow_seasonal_statsStandard deviation of historical observations for season
AR coefficientscoefficientinflow_ar_coefficientsAR coefficient standardized by seasonal std — the direct Yule-Walker output
Residual std ratioresidual_std_ratioinflow_ar_coefficientsResidual std as fraction of seasonal std, — a pure model property

The AR order is not stored explicitly. It is derived at runtime from the count of coefficient rows per (hydro_id, stage_id) group in inflow_ar_coefficients.parquet.

The standardized coefficient is the direct output of the Yule-Walker fitting procedure (see §5.4). It is dimensionless — the coefficient of the standardized process . The relationship to the original-unit coefficient used in the LP is:

Computed at runtime

From the stored quantities, the LP requires two additional quantities computed once at initialization:

Original-unit AR coefficients (for LP constraint matrix entries):

Residual standard deviation (for noise scaling):

No autocorrelation values are needed at runtime. All required quantities are derived solely from the stored seasonal stats and AR coefficient file.

Why store residual_std_ratio rather than directly? The residual std decomposes as , where is a conditioning quantity (swappable for climate scenario studies) and the ratio is a model dynamics property (fixed per PAR fit). Storing directly would bake in a specific : when the user swaps seasonal stats for a different climate scenario, the stored would be stale and noise scaling would be inconsistent with the new variability level. Storing the ratio preserves correct proportionality — if seasonal variability changes, noise scales proportionally. See also PAR Coefficient Storage design document §3.4.

LP coefficients

The stored standardized coefficients are converted to original-unit at runtime (see §7.2), and these enter the LP directly (see LP Formulation §5). The LP equation is:

where are state variables (lagged inflows) and is the sampled noise realization.

4. Model Order Selection

The PAR order can vary by season. Available selection criteria:

4.1 PACF (Periodic Partial Autocorrelation Function) – Default

The default method computes the periodic PACF via progressive periodic Yule-Walker matrix solves at orders , then selects the order using a significance threshold.

Algorithm:

  1. For each order from 1 to , build and solve the periodic Yule-Walker system (§5.4) at order . The last coefficient from the order- solution is the periodic PACF value at lag .

  2. Select the order as the maximum lag with significant PACF:

    where (95% confidence) and is the number of observations for season . If no lag is significant, (white noise).

  3. Estimate AR coefficients at the selected order using the periodic Yule-Walker system (§5.4).

Post-selection validation: After PACF selection, two rejection gates are applied iteratively:

  • Negative rejection: If (first AR coefficient is negative), the order is reduced. Negative contradicts the hydrological persistence property (inflows are positively autocorrelated at lag 1).
  • Contribution-based validation: The recursively-composed contributions for each lag are computed. If any contribution is negative (indicating potential model instability), the order is reduced to the maximum lag with non-negative contributions. This implements NEWAVE’s reducao_ordem algorithm.

The reduction process is iterative: after each reduction, the PACF selection and coefficient estimation are re-run at the new ceiling, and the validation checks are repeated until all seasons pass or reach order 0.

4.2 AIC (Akaike Information Criterion)

4.3 BIC (Bayesian Information Criterion)

4.4 Coefficient Significance

Include lag only if .

In all methods, is the number of historical observations for season .

5. Fitting Procedure

This section documents the five-step procedure for fitting PAR(p) parameters from historical inflow data. The fitting is performed when the system derives parameters from inflow_history.parquet (see Input Scenarios §2). When pre-computed parameters are provided directly in inflow_seasonal_stats.parquet and inflow_ar_coefficients.parquet, this procedure is not executed.

5.1 Notation

Let be the historical observations for season . Define:

SymbolDescription
Number of observations for season
Sample mean for season
Sample standard deviation for season
Autocovariance at lag for season
Autocorrelation at lag for season

5.2 Step 1 — Seasonal Means and Standard Deviations

Seasonal Mean:

Seasonal Standard Deviation:

5.3 Step 2 — Seasonal Autocorrelations

The autocorrelation at lag for season is computed from standardized deviations.

Cross-seasonal autocovariance:

For observations at season with lag reaching back to season (mod , where is the cycle length):

Autocorrelation:

where is the standard deviation of season (cyclically, so season 0 = season ).

5.4 Step 3 — Yule-Walker Equations

For each season , the PAR(p) coefficients in standardized form are found by solving the periodic Yule-Walker system. Unlike the classical (stationary) Yule-Walker equations where all rows use the same reference season, the periodic variant shifts the reference season per row. This correctly accounts for the non-Toeplitz covariance structure of periodic autoregressive processes.

Matrix construction: For row and column (0-indexed, ), the reference season is shifted by row index:

where is the number of seasons in the periodic cycle (e.g., 12 for monthly). The diagonal entries are always 1 (since for any season ). The matrix is symmetric but not Toeplitz when , because each row references a different season for its autocorrelation values.

RHS construction: Each RHS element also uses a shifted reference season:

This comes from column of the extended version of the periodic autocorrelation matrix.

The full system is:

where all season indices are taken modulo .

In matrix notation:

where:

  • is the periodic correlation matrix (symmetric but not Toeplitz for )
  • is the vector of target autocorrelations with per-row reference season shifting

Note: For a single-season model (), all rows use the same reference season and the matrix reduces to the classical Toeplitz Yule-Walker matrix. The periodic formulation is the general case that correctly handles multi-season (e.g., monthly) data.

Solution:

The system is solved via Gaussian elimination with partial pivoting (for small systems with , this is numerically adequate).

5.5 Step 4 — Store Standardized Coefficients and Residual Fraction

The Yule-Walker solution is in standardized form — the direct output of step 3. It is stored as-is in inflow_ar_coefficients.parquet. No conversion to original units is performed.

Compute and store the residual std ratio:

Both (one row per lag) and (repeated across all lag rows of the same (hydro, stage) group) are written to inflow_ar_coefficients.parquet.

5.6 Step 5 — Residual Standard Deviation

The residual standard deviation for season is recovered at runtime from the stored ratio (see §3):

For reference, the full expression in terms of fitting quantities is:

6. Validation Invariants

After fitting or loading pre-computed parameters, the following invariants must hold:

  1. Positive residual variance: for all seasons. If violated, the AR model explains all variance — likely overfitting.
  2. Stationarity: Roots of lie outside the unit circle. Ensures the AR process is stable and does not diverge.
  3. Correlation matrix positive definite: is invertible. Required for Yule-Walker solution to exist. If violated, the historical record may be too short for the requested AR order.
  4. No systematic bias: Residuals have mean near zero. Indicates the model captures the mean structure correctly.
  5. AR order derivation: The number of coefficient rows per (hydro_id, stage_id) in inflow_ar_coefficients.parquet determines the AR order . Lags must be contiguous: .
  6. Residual std ratio consistency: The residual_std_ratio value must be identical across all lag rows sharing the same (hydro_id, stage_id) group, and must lie in .

7. PAR-to-LP Transformation

This section derives the explicit algebraic transformation from the canonical PAR(p) model (section 1) into the form consumed by the LP subproblem. The derivation identifies three precomputable components that are cached once at initialization and reused at every forward-pass stage transition.

7.1 Canonical Standardized Form

The PAR(p) model (section 1) operates on deviations from the seasonal mean, scaled by the seasonal standard deviation. In fully standardized form:

where:

  • : AR coefficients in fully standardized form (correlations between normalized deviations)
  • : residual standard deviation for season (derived at runtime — see §3)
  • : innovation noise

The input files store (standardized by seasonal std , not residual std ). The next step converts these to original-unit for use in the LP.

7.2 Coefficient Conversion

The stored standardized coefficients are converted to original-unit coefficients at runtime using the seasonal standard deviations from inflow_seasonal_stats.parquet:

The residual standard deviation is also derived at this preprocessing step:

These conversions are performed once at LP construction time. They require only the seasonal stats () and the stored model quantities (, ) — no autocorrelation values, no historical data.

7.3 LP-Ready Form

Multiplying both sides of the canonical form (7.1) by and rearranging yields the LP-ready equation:

where and are derived from stored quantities as described in §7.2.

This decomposes the inflow into three additive components:

  1. Lag contribution: — linear function of past inflows (state variables or known values)
  2. Deterministic base: — constant offset per (stage, hydro), precomputed once
  3. Stochastic innovation: — noise draw scaled by the seasonal residual standard deviation

7.4 Deterministic Base

The deterministic base is defined as:

This is a precomputed constant per (stage, hydro) pair. It absorbs the mean-adjustment arithmetic that would otherwise be repeated at every forward-pass stage transition. With this definition, the LP-ready form (7.3) simplifies to:

7.5 LP RHS Patching Operation

The lagged inflows are LP variables, not substituted values. In the LP (see LP Formulation §5), they appear with coefficients in the AR dynamics constraint row, and separate equality constraints fix each lag variable to its incoming state value (see LP Formulation §5a):

where is patched per scenario to the actual lagged inflow from the trajectory record.

Because the lag contribution is carried by the constraint matrix (not the RHS), the AR dynamics constraint RHS reduces to:

where:

  • is read from PrecomputedParLp.deterministic_base[stage][hydro]
  • is read from PrecomputedParLp.sigma[stage][hydro]
  • is the scenario noise draw for this (stage, hydro)

The coefficients from PrecomputedParLp.psi[stage][hydro][lag] are written into the constraint matrix once at LP construction time as the coefficients on the lagged inflow variables; they are not recomputed per scenario.

No division, no mean subtraction, no repeated coefficient transformation — the precomputation in PrecomputedParLp eliminates all redundant arithmetic from the hot path.

7.6 Summary of LP Components

ComponentSymbolShape per stageLP RoleSource
Lag coefficientsOne per (hydro, lag)Constraint matrix (AR dynamics row)Derived from stored and at initialization (§7.2)
Deterministic baseOne per hydroAR dynamics constraint RHS (fixed term)Precomputed from and
Noise scaleOne per hydroAR dynamics constraint RHS (noise factor)Derived from stored ratio and at initialization (§7.2)

Cross-References

  • Input Scenarios §3.1–3.2 — Defines inflow_seasonal_stats.parquet (μ, s) and inflow_ar_coefficients.parquet (ψ* per lag, residual_std_ratio)
  • LP Formulation §5 — AR inflow dynamics in the LP: state expansion, lag fixing constraints, dual variables
  • Internal Structures §14PrecomputedParLp struct caching the three LP components derived in section 7
  • Inflow Non-Negativity — Methods for handling negative realizations produced by the PAR(p) model
  • Scenario Generation §4.2 — When external scenarios are used in training, a PAR model is fitted to the external data for backward pass opening tree generation. The fitting procedure (§5 above) applies equally to this derived model.
  • Notation Conventions — Defines inflow symbols (, , , ) and unit conventions