PAR(p) Inflow Model

Purpose

This spec defines the Periodic Autoregressive model of order $p$ (PAR(p)) used to capture temporal correlation in inflow time series, including the model definition, parameter semantics, the relationship between stored and computed quantities, the fitting procedure, model order selection, and validation invariants.

1. Model Definition

The Periodic Autoregressive model of order p (PAR(p)) captures temporal correlation in inflow time series while accounting for seasonal variation in parameters. For hydro $h$ at stage $t$ corresponding to season $m (t)$ :

$a_{h, t} = μ_{m (t)} + ℓ = 1 \sum p ψ_{m (t), ℓ} (a_{h, t - ℓ} - μ_{m (t - ℓ)}) + σ_{m (t)} \cdot ε_{t}$

where:

$a_{h, t}$ : Incremental inflow at stage $t$ (m³/s)
$μ_{m (t)}$ : Seasonal mean for season $m (t)$
$ψ_{m (t), ℓ}$ : Autoregressive coefficient for lag $ℓ$ in season $m (t)$
$σ_{m (t)}$ : Residual standard deviation for season $m (t)$ (computed at runtime — see §3)
$ε_{t} \sim N (0, 1)$ : Innovation (standardized noise)
$m (t)$ : Season index for stage $t$ (e.g., month 1–12)

The model order $p$ can vary by season and by hydro plant.

2. Parameter Set

For each hydro $h$ and each season $m \in {1, \dots, M}$ (e.g., $M = 12$ for monthly, $M = 52$ for weekly), the complete PAR(p) model requires:

Parameter	Symbol	Description
Seasonal mean	$μ_{m}$	Mean inflow for season $m$
AR coefficients	$ψ_{m, 1}, \dots, ψ_{m, p}$	Autoregressive coefficients
Residual standard deviation	$σ_{m}$	Scale of innovation term

3. Stored vs. Computed Quantities

The data model stores seasonal sample statistics and standardized AR coefficients with an explicit residual fraction. The relationship between stored and computed quantities is: PAR model stored vs computed quantities — files on disk store scale-invariant ψ* and residual_std_ratio, runtime converts to original-unit ψ and σ using seasonal stats

Stored in input files

These are provided in inflow_seasonal_stats.parquet and inflow_ar_coefficients.parquet (see Input Scenarios §3.1–3.2):

Stored quantity	Column	File	Symbol	Description
Seasonal sample mean	`mean_m3s`	`inflow_seasonal_stats`	$μ_{m} = \overset{a}{ˉ}_{m}$	Mean of historical observations for season $m$
Seasonal sample std	`std_m3s`	`inflow_seasonal_stats`	$s_{m}$	Standard deviation of historical observations for season $m$
AR coefficients	`coefficient`	`inflow_ar_coefficients`	$ψ_{m, ℓ}^{*}$	AR coefficient standardized by seasonal std — the direct Yule-Walker output
Residual std ratio	`residual_std_ratio`	`inflow_ar_coefficients`	$σ_{m} / s_{m}$	Residual std as fraction of seasonal std, $\in (0, 1]$ — a pure model property

The AR order $p_{m}$ is not stored explicitly. It is derived at runtime from the count of coefficient rows per (hydro_id, stage_id) group in inflow_ar_coefficients.parquet.

The standardized coefficient $ψ_{m, ℓ}^{*}$ is the direct output of the Yule-Walker fitting procedure (see §5.4). It is dimensionless — the coefficient of the standardized process $(a_{h, t} - μ_{m}) / s_{m}$ . The relationship to the original-unit coefficient $ψ_{m, ℓ}$ used in the LP is:

$ψ_{m, ℓ} = ψ_{m, ℓ}^{*} \cdot \frac{s _{m}}{s _{m - ℓ}}$

Computed at runtime

From the stored quantities, the LP requires two additional quantities computed once at initialization:

Original-unit AR coefficients (for LP constraint matrix entries):

$ψ_{m, ℓ} = ψ_{m, ℓ}^{*} \cdot \frac{s _{m}}{s _{m - ℓ}}$

Residual standard deviation (for noise scaling):

$σ_{m} = s_{m} \cdot residual_std_ratio_{m}$

No autocorrelation values are needed at runtime. All required quantities are derived solely from the stored seasonal stats and AR coefficient file.

Why store residual_std_ratio rather than $σ_{m}$ directly? The residual std decomposes as $σ_{m} = s_{m} \cdot residual_std_ratio_{m}$ , where $s_{m}$ is a conditioning quantity (swappable for climate scenario studies) and the ratio is a model dynamics property (fixed per PAR fit). Storing $σ_{m}$ directly would bake in a specific $s_{m}$ : when the user swaps seasonal stats for a different climate scenario, the stored $σ_{m}$ would be stale and noise scaling would be inconsistent with the new variability level. Storing the ratio preserves correct proportionality — if seasonal variability changes, noise scales proportionally. See also PAR Coefficient Storage design document §3.4.

LP coefficients

The stored standardized coefficients $ψ_{m, ℓ}^{*}$ are converted to original-unit $ψ_{m, ℓ}$ at runtime (see §7.2), and these enter the LP directly (see LP Formulation §5). The LP equation is:

$a_{h} = deterministic base (μ_{m} - ℓ = 1 \sum p ψ_{m, ℓ} μ_{m - ℓ}) + lag contribution ℓ = 1 \sum p ψ_{m, ℓ} \cdot a_{h, ℓ} + stochastic innovation σ_{m} \cdot η_{t}$

where $a_{h, ℓ}$ are state variables (lagged inflows) and $η_{t}$ is the sampled noise realization.

4. Model Order Selection

The PAR order $p$ can vary by season. Available selection criteria:

4.1 PACF (Periodic Partial Autocorrelation Function) – Default

The default method computes the periodic PACF via progressive periodic Yule-Walker matrix solves at orders $k = 1, 2, \dots, p_{ma x}$ , then selects the order using a significance threshold.

Algorithm:

For each order $k$ from 1 to $p_{ma x}$ , build and solve the periodic Yule-Walker system (§5.4) at order $k$ . The last coefficient $\hat{ψ}_{m, k}^{*}$ from the order- $k$ solution is the periodic PACF value at lag $k$ .
Select the order as the maximum lag with significant PACF:

$p_{m} = max {k : ∣ PACF_{m} (k) ∣ > \frac{z _{α}}{N _{m}}}$

where $z_{α} = 1.96$ (95% confidence) and $N_{m}$ is the number of observations for season $m$ . If no lag is significant, $p_{m} = 0$ (white noise).
Estimate AR coefficients at the selected order using the periodic Yule-Walker system (§5.4).

Post-selection validation: After PACF selection, two rejection gates are applied iteratively:

Negative $ϕ_{1}$ rejection: If $\hat{ψ}_{m, 1}^{*} < 0$ (first AR coefficient is negative), the order is reduced. Negative $ϕ_{1}$ contradicts the hydrological persistence property (inflows are positively autocorrelated at lag 1).
Contribution-based validation: The recursively-composed contributions for each lag are computed. If any contribution is negative (indicating potential model instability), the order is reduced to the maximum lag with non-negative contributions. This implements NEWAVE’s reducao_ordem algorithm.

The reduction process is iterative: after each reduction, the PACF selection and coefficient estimation are re-run at the new ceiling, and the validation checks are repeated until all seasons pass or reach order 0.

4.2 AIC (Akaike Information Criterion)

$AIC_{m} (p) = N_{m} ln (\overset{σ}{^}_{m}^{2}) + 2 p$

4.3 BIC (Bayesian Information Criterion)

$BIC_{m} (p) = N_{m} ln (\overset{σ}{^}_{m}^{2}) + p ln (N_{m})$

4.4 Coefficient Significance

Include lag $ℓ$ only if $∣ \hat{ψ}_{m, ℓ} ∣ > 2/ N_{m}$ .

In all methods, $N_{m}$ is the number of historical observations for season $m$ .

5. Fitting Procedure

This section documents the five-step procedure for fitting PAR(p) parameters from historical inflow data. The fitting is performed when the system derives parameters from inflow_history.parquet (see Input Scenarios §2). When pre-computed parameters are provided directly in inflow_seasonal_stats.parquet and inflow_ar_coefficients.parquet, this procedure is not executed.

5.1 Notation

Let $Y_{m} = {a_{h, t} : m (t) = m}$ be the historical observations for season $m$ . Define:

Symbol	Description
$N_{m}$	Number of observations for season $m$
$\overset{a}{ˉ}_{m}$	Sample mean for season $m$
$s_{m}$	Sample standard deviation for season $m$
$γ_{m} (ℓ)$	Autocovariance at lag $ℓ$ for season $m$
$ρ_{m} (ℓ)$	Autocorrelation at lag $ℓ$ for season $m$

5.2 Step 1 — Seasonal Means and Standard Deviations

Seasonal Mean:

$\overset{μ}{^}_{m} = \overset{a}{ˉ}_{m} = \frac{1}{N _{m}} t : m (t) = m \sum a_{h, t}$

Seasonal Standard Deviation:

$\overset{s}{^}_{m} = \frac{1}{N _{m} - 1} t : m (t) = m \sum (a_{h, t} - \overset{a}{ˉ}_{m})^{2}$

5.3 Step 2 — Seasonal Autocorrelations

The autocorrelation at lag $ℓ$ for season $m$ is computed from standardized deviations.

Cross-seasonal autocovariance:

For observations at season $m$ with lag $ℓ$ reaching back to season $m - ℓ$ (mod $M$ , where $M$ is the cycle length):

$\overset{γ}{^}_{m} (ℓ) = \frac{1}{N _{m} - 1} t : m (t) = m \sum (a_{h, t} - \overset{a}{ˉ}_{m}) (a_{h, t - ℓ} - \overset{a}{ˉ}_{m - ℓ})$

Autocorrelation:

$\overset{ρ}{^}_{m} (ℓ) = \frac{γ ^ _{m} ( ℓ )}{s ^ _{m} \cdot s ^ _{m - ℓ}}$

where $\overset{s}{^}_{m - ℓ}$ is the standard deviation of season $m - ℓ$ (cyclically, so season 0 = season $M$ ).

5.4 Step 3 — Yule-Walker Equations

For each season $m$ , the PAR(p) coefficients $ψ_{m, 1}^{*}, \dots, ψ_{m, p}^{*}$ in standardized form are found by solving the periodic Yule-Walker system. Unlike the classical (stationary) Yule-Walker equations where all rows use the same reference season, the periodic variant shifts the reference season per row. This correctly accounts for the non-Toeplitz covariance structure of periodic autoregressive processes.

Matrix construction: For row $i$ and column $j$ (0-indexed, $0 \leq i, j < p$ ), the reference season is shifted by row index:

$[R_{m}]_{i, j} = \overset{ρ}{^}_{(m - i) mod M} (∣ j - i ∣)$

where $M$ is the number of seasons in the periodic cycle (e.g., 12 for monthly). The diagonal entries are always 1 (since $\overset{ρ}{^}_{m^{'}} (0) = 1$ for any season $m^{'}$ ). The matrix is symmetric but not Toeplitz when $M > 1$ , because each row references a different season for its autocorrelation values.

RHS construction: Each RHS element also uses a shifted reference season:

$[r_{m}]_{i} = \overset{ρ}{^}_{(m - i) mod M} (p - i)$

This comes from column $p$ of the extended $(p + 1) \times (p + 1)$ version of the periodic autocorrelation matrix.

The full system is:

$1 \overset{ρ}{^}_{(m - 1)} (1) \overset{ρ}{^}_{(m - 2)} (2) ⋮ \overset{ρ}{^}_{(m - p + 1)} (p - 1) \overset{ρ}{^}_{m} (1) 1 \overset{ρ}{^}_{(m - 2)} (1) ⋮ \overset{ρ}{^}_{(m - p + 1)} (p - 2) \overset{ρ}{^}_{m} (2) \overset{ρ}{^}_{(m - 1)} (1) 1 ⋮ \overset{ρ}{^}_{(m - p + 1)} (p - 3) \dots \dots \dots ⋱ \dots \overset{ρ}{^}_{m} (p - 1) \overset{ρ}{^}_{(m - 1)} (p - 2) \overset{ρ}{^}_{(m - 2)} (p - 3) ⋮ 1 ψ_{m, 1}^{*} ψ_{m, 2}^{*} ψ_{m, 3}^{*} ⋮ ψ_{m, p}^{*} = \overset{ρ}{^}_{m} (p) \overset{ρ}{^}_{(m - 1)} (p - 1) \overset{ρ}{^}_{(m - 2)} (p - 2) ⋮ \overset{ρ}{^}_{(m - p + 1)} (1)$

where all season indices are taken modulo $M$ .

In matrix notation: $R_{m} ψ_{m}^{*} = r_{m}$

where:

$R_{m}$ is the $p \times p$ periodic correlation matrix (symmetric but not Toeplitz for $M > 1$ )
$r_{m}$ is the vector of target autocorrelations with per-row reference season shifting

Note: For a single-season model ( $M = 1$ ), all rows use the same reference season and the matrix reduces to the classical Toeplitz Yule-Walker matrix. The periodic formulation is the general case that correctly handles multi-season (e.g., monthly) data.

Solution:

$\hat{ψ}_{m}^{*} = R_{m}^{- 1} r_{m}$

The system is solved via Gaussian elimination with partial pivoting (for small systems with $p \leq 10$ , this is numerically adequate).

5.5 Step 4 — Store Standardized Coefficients and Residual Fraction

The Yule-Walker solution $ψ_{m, ℓ}^{*}$ is in standardized form — the direct output of step 3. It is stored as-is in inflow_ar_coefficients.parquet. No conversion to original units is performed.

Compute and store the residual std ratio:

$residual_std_ratio_{m} = 1 - ψ_{m}^{* ⊤} r_{m} = 1 - ℓ = 1 \sum p ψ_{m, ℓ}^{*} \cdot \overset{ρ}{^}_{m} (ℓ)$

Both $ψ_{m, ℓ}^{*}$ (one row per lag) and $residual_std_ratio_{m}$ (repeated across all lag rows of the same (hydro, stage) group) are written to inflow_ar_coefficients.parquet.

5.6 Step 5 — Residual Standard Deviation

The residual standard deviation for season $m$ is recovered at runtime from the stored ratio (see §3):

$\overset{σ}{^}_{m} = \overset{s}{^}_{m} \cdot residual_std_ratio_{m}$

For reference, the full expression in terms of fitting quantities is:

$\overset{σ}{^}_{m} = \overset{s}{^}_{m} 1 - r_{m}^{⊤} R_{m}^{- 1} r_{m}$

6. Validation Invariants

After fitting or loading pre-computed parameters, the following invariants must hold:

Positive residual variance: $σ_{m}^{2} > 0$ for all seasons. If violated, the AR model explains all variance — likely overfitting.
Stationarity: Roots of $1 - \sum_{ℓ} ψ_{m, ℓ} z^{ℓ} = 0$ lie outside the unit circle. Ensures the AR process is stable and does not diverge.
Correlation matrix positive definite: $R_{m}$ is invertible. Required for Yule-Walker solution to exist. If violated, the historical record may be too short for the requested AR order.
No systematic bias: Residuals $ε_{t}$ have mean near zero. Indicates the model captures the mean structure correctly.
AR order derivation: The number of coefficient rows per (hydro_id, stage_id) in inflow_ar_coefficients.parquet determines the AR order $p_{m}$ . Lags must be contiguous: ${1, 2, \dots, p_{m}}$ .
Residual std ratio consistency: The residual_std_ratio value must be identical across all lag rows sharing the same (hydro_id, stage_id) group, and must lie in $(0, 1]$ .

7. PAR-to-LP Transformation

This section derives the explicit algebraic transformation from the canonical PAR(p) model (section 1) into the form consumed by the LP subproblem. The derivation identifies three precomputable components that are cached once at initialization and reused at every forward-pass stage transition.

7.1 Canonical Standardized Form

The PAR(p) model (section 1) operates on deviations from the seasonal mean, scaled by the seasonal standard deviation. In fully standardized form:

$\frac{a _{h, t} - μ _{m (t)}}{σ _{m (t)}} = ℓ = 1 \sum p ϕ_{m (t), ℓ} \frac{a _{h, t - ℓ} - μ _{m (t - ℓ)}}{σ _{m (t - ℓ)}} + ε_{t}$

where:

$ϕ_{m (t), ℓ}$ : AR coefficients in fully standardized form (correlations between normalized deviations)
$σ_{m (t)}$ : residual standard deviation for season $m (t)$ (derived at runtime — see §3)
$ε_{t} \sim N (0, 1)$ : innovation noise

The input files store $ψ_{m, ℓ}^{*}$ (standardized by seasonal std $s_{m}$ , not residual std $σ_{m}$ ). The next step converts these to original-unit $ψ_{m, ℓ}$ for use in the LP.

7.2 Coefficient Conversion

The stored standardized coefficients $ψ_{m, ℓ}^{*}$ are converted to original-unit coefficients $ψ_{m, ℓ}$ at runtime using the seasonal standard deviations from inflow_seasonal_stats.parquet:

$ψ_{m, ℓ} = ψ_{m, ℓ}^{*} \cdot \frac{s _{m}}{s _{m - ℓ}}$

The residual standard deviation is also derived at this preprocessing step:

$σ_{m} = s_{m} \cdot residual_std_ratio_{m}$

These conversions are performed once at LP construction time. They require only the seasonal stats ( $s_{m}$ ) and the stored model quantities ( $ψ_{m, ℓ}^{*}$ , $residual_std_ratio_{m}$ ) — no autocorrelation values, no historical data.

7.3 LP-Ready Form

Multiplying both sides of the canonical form (7.1) by $σ_{m (t)}$ and rearranging yields the LP-ready equation:

$a_{h, t} = ℓ = 1 \sum p ψ_{m (t), ℓ} \cdot a_{h, t - ℓ} + [μ_{m (t)} - ℓ = 1 \sum p ψ_{m (t), ℓ} \cdot μ_{m (t - ℓ)}] + σ_{m (t)} \cdot ε_{t}$

where $ψ_{m (t), ℓ}$ and $σ_{m (t)}$ are derived from stored quantities as described in §7.2.

This decomposes the inflow into three additive components:

Lag contribution: $ℓ = 1 \sum p ψ_{m (t), ℓ} \cdot a_{h, t - ℓ}$ — linear function of past inflows (state variables or known values)
Deterministic base: $μ_{m (t)} - ℓ = 1 \sum p ψ_{m (t), ℓ} \cdot μ_{m (t - ℓ)}$ — constant offset per (stage, hydro), precomputed once
Stochastic innovation: $σ_{m (t)} \cdot ε_{t}$ — noise draw scaled by the seasonal residual standard deviation

7.4 Deterministic Base

The deterministic base is defined as:

$b_{h, m (t)} = μ_{m (t)} - ℓ = 1 \sum p ψ_{m (t), ℓ} \cdot μ_{m (t - ℓ)}$

This is a precomputed constant per (stage, hydro) pair. It absorbs the mean-adjustment arithmetic that would otherwise be repeated at every forward-pass stage transition. With this definition, the LP-ready form (7.3) simplifies to:

$a_{h, t} = ℓ = 1 \sum p ψ_{m (t), ℓ} \cdot a_{h, t - ℓ} + b_{h, m (t)} + σ_{m (t)} \cdot ε_{t}$

7.5 LP RHS Patching Operation

The lagged inflows $a_{h, t - ℓ}$ are LP variables, not substituted values. In the LP (see LP Formulation §5), they appear with coefficients $- ψ_{m (t), ℓ}$ in the AR dynamics constraint row, and separate equality constraints fix each lag variable to its incoming state value (see LP Formulation §5a):

$a_{h, t - ℓ} = \overset{a}{^}_{h, t - ℓ}$

where $\overset{a}{^}_{h, t - ℓ}$ is patched per scenario to the actual lagged inflow from the trajectory record.

Because the lag contribution $\sum_{ℓ} ψ \cdot a_{h, t - ℓ}$ is carried by the constraint matrix (not the RHS), the AR dynamics constraint RHS reduces to:

$RHS_{h, t} = b_{h, m (t)} + σ_{m (t)} \cdot ε_{t}$

where:

$b_{h, m (t)}$ is read from PrecomputedParLp.deterministic_base[stage][hydro]
$σ_{m (t)}$ is read from PrecomputedParLp.sigma[stage][hydro]
$ε_{t}$ is the scenario noise draw for this (stage, hydro)

The $ψ_{m (t), ℓ}$ coefficients from PrecomputedParLp.psi[stage][hydro][lag] are written into the constraint matrix once at LP construction time as the coefficients on the lagged inflow variables; they are not recomputed per scenario.

No division, no mean subtraction, no repeated coefficient transformation — the precomputation in PrecomputedParLp eliminates all redundant arithmetic from the hot path.

7.6 Summary of LP Components

Component	Symbol	Shape per stage	LP Role	Source
Lag coefficients	$ψ_{m (t), ℓ}$	One per (hydro, lag)	Constraint matrix (AR dynamics row)	Derived from stored $ψ^{*}$ and $s_{m}$ at initialization (§7.2)
Deterministic base	$b_{h, m (t)}$	One per hydro	AR dynamics constraint RHS (fixed term)	Precomputed from $μ$ and $ψ$
Noise scale	$σ_{m (t)}$	One per hydro	AR dynamics constraint RHS (noise factor)	Derived from stored ratio and $s_{m}$ at initialization (§7.2)

Cross-References

Input Scenarios §3.1–3.2 — Defines inflow_seasonal_stats.parquet (μ, s) and inflow_ar_coefficients.parquet (ψ* per lag, residual_std_ratio)
LP Formulation §5 — AR inflow dynamics in the LP: state expansion, lag fixing constraints, dual variables
Internal Structures §14 — PrecomputedParLp struct caching the three LP components derived in section 7
Inflow Non-Negativity — Methods for handling negative realizations produced by the PAR(p) model
Scenario Generation §4.2 — When external scenarios are used in training, a PAR model is fitted to the external data for backward pass opening tree generation. The fitting procedure (§5 above) applies equally to this derived model.
Notation Conventions — Defines inflow symbols ( $a_{h, t}$ , $μ_{m}$ , $ψ_{m, ℓ}$ , $σ_{m}$ ) and unit conventions

Keyboard shortcuts

Cobre Methodology Reference