Skip to content

PAR(p) Inflow Model

This spec defines the Periodic Autoregressive model of order pp (PAR(p)) used to capture temporal correlation in inflow time series, including the model definition, parameter semantics, the relationship between stored and computed quantities, the fitting procedure, model order selection, and validation invariants. Section 9 describes the optional PAR(p)-A extension that adds a single annual coefficient on top of the periodic AR structure to capture multi-year hydrological persistence.

The Periodic Autoregressive model of order p (PAR(p)) captures temporal correlation in inflow time series while accounting for seasonal variation in parameters. For hydro hh at stage tt corresponding to season m(t)m(t):

ah,t=μm(t)+=1pψm(t),(ah,tμm(t))+σm(t)εta_{h,t} = \mu_{m(t)} + \sum_{\ell=1}^{p} \psi_{m(t),\ell} \left( a_{h,t-\ell} - \mu_{m(t-\ell)} \right) + \sigma_{m(t)} \cdot \varepsilon_t

where:

  • ah,ta_{h,t}: Incremental inflow at stage tt (m³/s)
  • μm(t)\mu_{m(t)}: Seasonal mean for season m(t)m(t)
  • ψm(t),\psi_{m(t),\ell}: Autoregressive coefficient for lag \ell in season m(t)m(t)
  • σm(t)\sigma_{m(t)}: Residual standard deviation for season m(t)m(t) (computed at runtime — see section 3)
  • εtN(0,1)\varepsilon_t \sim \mathcal{N}(0, 1): Innovation (standardized noise)
  • m(t)m(t): Season index for stage tt (e.g., month 1–12)

The model order pp can vary by season and by hydro plant.

For each hydro hh and each season m{1,,M}m \in \{1, \ldots, M\} (e.g., M=12M = 12 for monthly, M=52M = 52 for weekly), the complete PAR(p) model requires:

ParameterSymbolDescription
Seasonal meanμm\mu_mMean inflow for season mm
AR coefficientsψm,1,,ψm,p\psi_{m,1}, \ldots, \psi_{m,p}Autoregressive coefficients
Residual standard deviationσm\sigma_mScale of innovation term

The data model stores seasonal sample statistics and standardized AR coefficients with an explicit residual fraction. The relationship between stored and computed quantities is:

Stored vs. computed quantities — the files on disk hold the scale-invariant ψ* and residual_std_ratio; at runtime these are converted to original-unit ψ and σ using the seasonal stats, then consumed by the LP stage subproblem.

Storage format — files on diskRuntime format — in-memory for LPconsumed by LP stage subprobleminflow_seasonal_statsinflow_ar_coefficientsOriginal-unit AR coeff: ψₘ,ℓ = ψ*ₘ,ℓ · sₘ / sₘ₋ℓOriginal-unit residual std: σₘ = sₘ · (σₘ / sₘ)μₘ — seasonal meansₘ — seasonal std (sample)ψ*ₘ,ℓ — standardized AR coeffσₘ / sₘ — residual std ratiopₘ — AR order × sₘ / sₘ₋ℓ× sₘ

These are provided in inflow_seasonal_stats.parquet and inflow_ar_coefficients.parquet:

Stored quantityColumnFileSymbolDescription
Seasonal sample meanmean_m3sinflow_seasonal_statsμm=aˉm\mu_m = \bar{a}_mMean of historical observations for season mm
Seasonal sample stdstd_m3sinflow_seasonal_statssms_mStandard deviation of historical observations for season mm
AR coefficientscoefficientinflow_ar_coefficientsψm,\psi^*_{m,\ell}AR coefficient standardized by seasonal std — the direct Yule-Walker output
Residual std ratioresidual_std_ratioinflow_ar_coefficientsσm/sm\sigma_m / s_mResidual std as fraction of seasonal std, (0,1]\in (0, 1] — a pure model property

The AR order pmp_m is not stored explicitly. It is derived at runtime from the count of coefficient rows per (hydro_id, stage_id) group in inflow_ar_coefficients.parquet.

The standardized coefficient ψm,\psi^*_{m,\ell} is the direct output of the Yule-Walker fitting procedure (see section 5.4). It is dimensionless — the coefficient of the standardized process (ah,tμm)/sm(a_{h,t} - \mu_m) / s_m. The relationship to the original-unit coefficient ψm,\psi_{m,\ell} used in the LP is:

ψm,=ψm,smsm\psi_{m,\ell} = \psi^*_{m,\ell} \cdot \frac{s_m}{s_{m-\ell}}

From the stored quantities, the LP requires two additional quantities computed once at initialization:

Original-unit AR coefficients (for LP constraint matrix entries):

ψm,=ψm,smsm\psi_{m,\ell} = \psi^*_{m,\ell} \cdot \frac{s_m}{s_{m-\ell}}

Residual standard deviation (for noise scaling):

σm=smresidual_std_ratiom\sigma_m = s_m \cdot \texttt{residual\_std\_ratio}_m

No autocorrelation values are needed at runtime. All required quantities are derived solely from the stored seasonal stats and AR coefficient file.

The stored standardized coefficients ψm,\psi^*_{m,\ell} are converted to original-unit ψm,\psi_{m,\ell} at runtime (see section 7.2), and these enter the LP directly (see LP Formulation). The LP equation is:

ah=(μm=1pψm,μm)deterministic base+=1pψm,ah,lag contribution+σmηtstochastic innovationa_h = \underbrace{\left( \mu_m - \sum_{\ell=1}^{p} \psi_{m,\ell} \mu_{m-\ell} \right)}_{\text{deterministic base}} + \underbrace{\sum_{\ell=1}^{p} \psi_{m,\ell} \cdot a_{h,\ell}}_{\text{lag contribution}} + \underbrace{\sigma_m \cdot \eta_t}_{\text{stochastic innovation}}

where ah,a_{h,\ell} are state variables (lagged inflows) and ηt\eta_t is the sampled noise realization.

The PAR order pp can vary by season. Available selection criteria:

4.1 PACF (Periodic Partial Autocorrelation Function) — Default

Section titled “4.1 PACF (Periodic Partial Autocorrelation Function) — Default”

The default method computes the periodic PACF via progressive periodic Yule-Walker matrix solves at orders k=1,2,,pmaxk = 1, 2, \ldots, p_{max}, then selects the order using a significance threshold.

Algorithm:

  1. For each order kk from 1 to pmaxp_{max}, build and solve the periodic Yule-Walker system (section 5.4) at order kk. The last coefficient ψ^m,k\hat{\psi}^*_{m,k} from the order-kk solution is the periodic PACF value at lag kk.

  2. Select the order as the maximum lag with significant PACF:

    pm=max{k:PACFm(k)>zαNm}p_m = \max \left\{ k : |\text{PACF}_m(k)| > \frac{z_\alpha}{\sqrt{N_m}} \right\}

    where zα=1.96z_\alpha = 1.96 (95% confidence) and NmN_m is the number of observations for season mm. If no lag is significant, pm=0p_m = 0 (white noise).

  3. Estimate AR coefficients at the selected order using the periodic Yule-Walker system (section 5.4).

Post-selection validation — Maceira-Damazio iterative reduction: After PACF selection, the recursively-composed contributions of each lag through the periodic monthly chain are computed. A negative composed contribution flags potential model instability — under SDDP the corresponding Benders cut can carry the negative composition into the future-cost recursion. When any season’s composed contribution is negative, the offending season’s AR ceiling is reduced and the PACF selection plus Yule-Walker fit are re-run at the new ceiling. The reduction iterates across all seasons until every season’s contribution recursion yields non-negative entries, or every offending season has been reduced to order 0.

For the PAR(p)-A path (section 9), two additional rules extend the PACF gate:

  • Structural-zero short-circuit at lag 1. When the conditional FACP value at lag 1 is exactly zero — which happens when the standardised annual noise series collapses, typically because a degenerate HistoryClass::Constant or HistoryClass::Saturated bucket has zeroed the seasonal std (section 5.7) — the selected order is forced to 0 (white noise). This blocks degenerate buckets from injecting spurious AR structure.
  • Minimum order 1 when lag 1 is well defined. When the lag-1 conditional FACP is non-zero but no lag exceeds the significance threshold, the model defaults to order 1 rather than order 0. Hydrological persistence makes a strict order-0 fit a poor default unless the lag-1 value is structurally absent.
AICm(p)=Nmln(σ^m2)+2p\text{AIC}_m(p) = N_m \ln(\hat{\sigma}_m^2) + 2p BICm(p)=Nmln(σ^m2)+pln(Nm)\text{BIC}_m(p) = N_m \ln(\hat{\sigma}_m^2) + p \ln(N_m)

Include lag \ell only if ψ^m,>2/Nm|\hat{\psi}_{m,\ell}| > 2 / \sqrt{N_m}.

In all methods, NmN_m is the number of historical observations for season mm.

For multi-resolution studies (monthly→quarterly aggregation), the same fitting procedure applies after duration-weighted aggregation; see Multi-resolution studies.

This section documents the five-step procedure for fitting PAR(p) parameters from historical inflow data. The fitting is performed when the system derives parameters from inflow_history.parquet. When pre-computed parameters are provided directly in inflow_seasonal_stats.parquet and inflow_ar_coefficients.parquet, this procedure is not executed.

Let Ym={ah,t:m(t)=m}Y_m = \{a_{h,t} : m(t) = m\} be the historical observations for season mm. Define:

SymbolDescription
NmN_mNumber of observations for season mm
aˉm\bar{a}_mSample mean for season mm
sms_mSample standard deviation for season mm
γm()\gamma_m(\ell)Autocovariance at lag \ell for season mm
ρm()\rho_m(\ell)Autocorrelation at lag \ell for season mm

5.2 Step 1 — Seasonal Means and Standard Deviations

Section titled “5.2 Step 1 — Seasonal Means and Standard Deviations”

Seasonal Mean:

μ^m=aˉm=1Nmt:m(t)=mah,t\hat{\mu}_m = \bar{a}_m = \frac{1}{N_m} \sum_{t: m(t) = m} a_{h,t}

Seasonal Standard Deviation:

s^m=1Nmt:m(t)=m(ah,taˉm)2\hat{s}_m = \sqrt{\frac{1}{N_m} \sum_{t: m(t) = m} (a_{h,t} - \bar{a}_m)^2}

The estimator uses the population divisor 1/Nm1/N_m, not the Bessel-corrected 1/(Nm1)1/(N_m - 1). This matches the Maceira-Damazio convention and is shared by the classical PAR(p) and PAR(p)-A paths. The population divisor is required for self-consistent conditional FACP values and selected orders on the PAR(p)-A path — under a Bessel correction the sample-vs-population scale factor leaks through every Z⊗A cross-correlation. Using the same divisor for the classical path keeps the two paths’ seasonal-stats output reusable across configurations.

The autocorrelation at lag \ell for season mm is computed from standardized deviations.

Cross-seasonal autocovariance:

For observations at season mm with lag \ell reaching back to season mm - \ell (mod MM, where MM is the cycle length):

γ^m()=1Nm()t:m(t)=m(ah,taˉm)(ah,taˉm)\hat{\gamma}_m(\ell) = \frac{1}{N_m^{(\ell)}} \sum_{t: m(t) = m} \left( a_{h,t} - \bar{a}_m \right) \left( a_{h,t-\ell} - \bar{a}_{m-\ell} \right)

where Nm()N_m^{(\ell)} is the number of year-aligned valid pairs at lag \ell for reference season mm. The estimator uses the population divisor 1/Nm()1/N_m^{(\ell)}, matching the convention adopted in section 5.2 and shared by the classical and PAR(p)-A paths.

Autocorrelation:

ρ^m()=γ^m()s^ms^m\hat{\rho}_m(\ell) = \frac{\hat{\gamma}_m(\ell)}{\hat{s}_m \cdot \hat{s}_{m-\ell}}

where s^m\hat{s}_{m-\ell} is the standard deviation of season mm - \ell (cyclically, so season 0 = season MM).

For each season mm, the PAR(p) coefficients ψm,1,,ψm,p\psi_{m,1}^*, \ldots, \psi_{m,p}^* in standardized form are found by solving the periodic Yule-Walker system. Unlike the classical (stationary) Yule-Walker equations where all rows use the same reference season, the periodic variant shifts the reference season per row. This correctly accounts for the non-Toeplitz covariance structure of periodic autoregressive processes.

Matrix construction: For row ii and column jj (0-indexed, 0i,j<p0 \leq i,j < p), the reference season is shifted by row index:

[Rm]i,j=ρ^(mi)modM(ji)[\mathbf{R}_m]_{i,j} = \hat{\rho}_{(m-i) \bmod M}(|j - i|)

where MM is the number of seasons in the periodic cycle (e.g., 12 for monthly). The diagonal entries are always 1 (since ρ^m(0)=1\hat{\rho}_{m'}(0) = 1 for any season mm'). The matrix is symmetric but not Toeplitz when M>1M > 1, because each row references a different season for its autocorrelation values.

RHS construction: Each RHS element also uses a shifted reference season:

[rm]i=ρ^(mi)modM(pi)[\boldsymbol{r}_m]_i = \hat{\rho}_{(m-i) \bmod M}(p - i)

This comes from column pp of the extended (p+1)×(p+1)(p{+}1) \times (p{+}1) version of the periodic autocorrelation matrix.

The full system is:

(1ρ^m(1)ρ^m(2)ρ^m(p1)ρ^(m1)(1)1ρ^(m1)(1)ρ^(m1)(p2)ρ^(m2)(2)ρ^(m2)(1)1ρ^(m2)(p3)ρ^(mp+1)(p1)ρ^(mp+1)(p2)ρ^(mp+1)(p3)1)(ψm,1ψm,2ψm,3ψm,p)=(ρ^m(p)ρ^(m1)(p1)ρ^(m2)(p2)ρ^(mp+1)(1))\begin{pmatrix} 1 & \hat{\rho}_{m}(1) & \hat{\rho}_{m}(2) & \cdots & \hat{\rho}_{m}(p{-}1) \\ \hat{\rho}_{(m-1)}(1) & 1 & \hat{\rho}_{(m-1)}(1) & \cdots & \hat{\rho}_{(m-1)}(p{-}2) \\ \hat{\rho}_{(m-2)}(2) & \hat{\rho}_{(m-2)}(1) & 1 & \cdots & \hat{\rho}_{(m-2)}(p{-}3) \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \hat{\rho}_{(m-p+1)}(p{-}1) & \hat{\rho}_{(m-p+1)}(p{-}2) & \hat{\rho}_{(m-p+1)}(p{-}3) & \cdots & 1 \end{pmatrix} \begin{pmatrix} \psi_{m,1}^* \\ \psi_{m,2}^* \\ \psi_{m,3}^* \\ \vdots \\ \psi_{m,p}^* \end{pmatrix} = \begin{pmatrix} \hat{\rho}_{m}(p) \\ \hat{\rho}_{(m-1)}(p{-}1) \\ \hat{\rho}_{(m-2)}(p{-}2) \\ \vdots \\ \hat{\rho}_{(m-p+1)}(1) \end{pmatrix}

where all season indices are taken modulo MM.

In matrix notation: Rmψm=rm\mathbf{R}_m \boldsymbol{\psi}_m^* = \boldsymbol{r}_m

where:

  • Rm\mathbf{R}_m is the p×pp \times p periodic correlation matrix (symmetric but not Toeplitz for M>1M > 1)
  • rm\boldsymbol{r}_m is the vector of target autocorrelations with per-row reference season shifting

Solution:

ψ^m=Rm1rm\hat{\boldsymbol{\psi}}_m^* = \mathbf{R}_m^{-1} \boldsymbol{r}_m

The system is solved via Gaussian elimination with partial pivoting (for small systems with p10p \leq 10, this is numerically adequate).

5.5 Step 4 — Store Standardized Coefficients and Residual Fraction

Section titled “5.5 Step 4 — Store Standardized Coefficients and Residual Fraction”

The Yule-Walker solution ψm,\psi_{m,\ell}^* is in standardized form — the direct output of step 3. It is stored as-is in inflow_ar_coefficients.parquet. No conversion to original units is performed.

Compute and store the residual std ratio:

residual_std_ratio^m=1ψmrm=1=1pψm,ρ^m()\widehat{\texttt{residual\_std\_ratio}}_m = \sqrt{1 - \boldsymbol{\psi}_m^{*\top} \boldsymbol{r}_m} = \sqrt{1 - \sum_{\ell=1}^{p} \psi_{m,\ell}^* \cdot \hat{\rho}_m(\ell)}

Both ψm,\psi^*_{m,\ell} (one row per lag) and residual_std_ratio^m\widehat{\texttt{residual\_std\_ratio}}_m (repeated across all lag rows of the same (hydro, stage) group) are written to inflow_ar_coefficients.parquet.

5.6 Step 5 — Residual Standard Deviation

Section titled “5.6 Step 5 — Residual Standard Deviation”

The residual standard deviation for season mm is recovered at runtime from the stored ratio (see section 3):

σ^m=s^mresidual_std_ratio^m\hat{\sigma}_m = \hat{s}_m \cdot \widehat{\texttt{residual\_std\_ratio}}_m

For reference, the full expression in terms of fitting quantities is:

σ^m=s^m1rmRm1rm\hat{\sigma}_m = \hat{s}_m \sqrt{1 - \boldsymbol{r}_m^\top \mathbf{R}_m^{-1} \boldsymbol{r}_m}

Before the seasonal stats and AR coefficients are used by the order-selection rules, each per-(hydro, season) historical bucket is classified by the shape of its observations. The classification can override the empirical (μ^m,s^m)(\hat{\mu}_m, \hat{s}_m) for fitting purposes, and the override propagates to both the classical PAR(p) and the PAR(p)-A paths because both paths share the seasonal-stats producer.

Four classes are defined:

ClassDetection ruleOverride applied
DefaultNone of the conditions belowNone — use empirical (μ^m,s^m)(\hat{\mu}_m, \hat{s}_m)
ConstantEvery observation equals the same value within float tolerance(μ^m,s^m)(value,0)(\hat{\mu}_m, \hat{s}_m) \leftarrow (\text{value}, 0)
ManyNegativeStrictly negative observations exceed 10% of the bucketNone — diagnostic only, fit proceeds on the empirical stats
SaturatedThe modal value (rounded to m³/s) occupies more than 50% of observations(μ^m,s^m)(cap,0)(\hat{\mu}_m, \hat{s}_m) \leftarrow (\text{cap}, 0)

The classifier runs in the priority order ConstantManyNegativeSaturatedDefault. Constancy takes precedence over negative-pathology detection, which in turn takes precedence over saturation.

Why s^m=0\hat{s}_m = 0 short-circuits the fit

Section titled “Why s^m=0\hat{s}_m = 0s^m​=0 short-circuits the fit”

When the override sets s^m=0\hat{s}_m = 0 for a season, every downstream fitter degenerates predictably:

  • On the classical PAR(p) path, the periodic autocorrelation ρ^m()\hat{\rho}_m(\ell) becomes zero by the zero-std guard in section 5.3, so the PACF selection (section 4.1) reports no significant lag and returns order 0 implicitly.
  • On the PAR(p)-A path, the standardised noise series collapses, the conditional FACP at lag 1 evaluates to exactly zero, and the structural-zero short-circuit (section 4.1) returns order 0 explicitly.

Either way, the bucket cannot inject spurious autoregressive structure into adjacent months’ PACFs, and no spatial-correlation contribution flows from it during scenario generation.

  • Constant captures plants whose incremental inflow is structurally constant for a given month — typically regulated or transposed flows where the upstream subtraction yields the same value every year. Forcing (value,0)(\text{value}, 0) records the deterministic level without inventing autoregressive dynamics.
  • Saturated captures flow caps (turbine or reservoir capacity) and low-flow constants (transposed ecological flows). The modal value is treated as the cap. There is no magnitude threshold — a cap of 0 m³/s qualifies just as readily as a cap at installed capacity.
  • ManyNegative flags buckets that the upstream incremental-inflow construction has driven below zero for more than 10% of observations. The condition is recorded for operator diagnostics but does not override the fit — the cause is upstream-data quality, not a methodological signal.
  • Default is the standard path; the empirical stats and the chosen order-selection rule decide the order.

5.8 Partial-year studies and the pre-study lag window

Section titled “5.8 Partial-year studies and the pre-study lag window”

A study horizon may be narrower than the seasonal cycle — e.g. a monthly model (M=12M = 12) running only September–December. The per-season fitting described above must then handle seasons that have few or no in-window observations. Two rules keep it well-defined.

Lag-reachability. A season is lag-reachable only if some stage of the (extended) horizon carries it. Each historical observation is resolved to a season from the stage date ranges, falling back to the season-map calendar for dates predating the horizon; an observation whose resolved season has no stage at all is skipped — its statistics would never be consumed. Full-cycle history therefore does not perturb a partial-year fit.

Pre-study lag synthesis (for p>0p > 0). The first study stage’s autoregressive lags reach back to seasons before the study start. For each lag k=1,,min(p,M1)k = 1, \ldots, \min(p,\, M - 1), the season kk calendar positions before the first study season is introduced as a pre-study season (modular on the true cycle length MM) — unless that season is already covered by a study stage (an in-window wrap lag, handled by the cycle-correct lag lookup). The seasonal statistics (μ^m,s^m)(\hat{\mu}_m, \hat{s}_m) of those out-of-window seasons are estimated from history exactly as for in-window seasons, then feed the lag terms of the opening study stages — both the coefficient conversion ψm,=ψm,s^m/s^m\psi_{m,\ell} = \psi^*_{m,\ell}\, \hat{s}_m / \hat{s}_{m-\ell} and the deterministic base.

The wrap uses the true cycle length MM (the number of seasons in the season map), not the number of seasons in the study window, together with a season offset equal to the season of the first study stage — so, e.g., a March-start study maps the lag-1 season to February, not December.

Full-cycle invariance. When the study spans the full cycle (every season already has a study stage) or carries no out-of-window history, nothing is synthesized and the fit is bit-identical to before.

After fitting or loading pre-computed parameters, the following invariants must hold:

  1. Positive residual variance: σm2>0\sigma_m^2 > 0 for all seasons. If violated, the AR model explains all variance — likely overfitting.
  2. Stationarity: Roots of 1ψm,z=01 - \sum_\ell \psi_{m,\ell} z^\ell = 0 lie outside the unit circle. Ensures the AR process is stable and does not diverge.
  3. Correlation matrix positive definite: Rm\mathbf{R}_m is invertible. Required for Yule-Walker solution to exist. If violated, the historical record may be too short for the requested AR order.
  4. No systematic bias: Residuals εt\varepsilon_t have mean near zero. Indicates the model captures the mean structure correctly.
  5. AR order derivation: The number of coefficient rows per (hydro_id, stage_id) in inflow_ar_coefficients.parquet determines the AR order pmp_m. Lags must be contiguous: {1,2,,pm}\{1, 2, \ldots, p_m\}.
  6. Residual std ratio consistency: The residual_std_ratio value must be identical across all lag rows sharing the same (hydro_id, stage_id) group, and must lie in (0,1](0, 1].

This section derives the explicit algebraic transformation from the canonical PAR(p) model (section 1) into the form consumed by the LP subproblem. The derivation identifies three precomputable components that are cached once at initialization and reused at every forward-pass stage transition.

The PAR(p) model (section 1) operates on deviations from the seasonal mean, scaled by the seasonal standard deviation. In fully standardized form:

ah,tμm(t)σm(t)==1pϕm(t),ah,tμm(t)σm(t)+εt\frac{a_{h,t} - \mu_{m(t)}}{\sigma_{m(t)}} = \sum_{\ell=1}^{p} \phi_{m(t),\ell} \frac{a_{h,t-\ell} - \mu_{m(t-\ell)}}{\sigma_{m(t-\ell)}} + \varepsilon_t

where:

  • ϕm(t),\phi_{m(t),\ell}: AR coefficients in fully standardized form (correlations between normalized deviations)
  • σm(t)\sigma_{m(t)}: residual standard deviation for season m(t)m(t) (derived at runtime — see section 3)
  • εtN(0,1)\varepsilon_t \sim \mathcal{N}(0, 1): innovation noise

The input files store ψm,\psi^*_{m,\ell} (standardized by seasonal std sms_m, not residual std σm\sigma_m). The next step converts these to original-unit ψm,\psi_{m,\ell} for use in the LP.

The stored standardized coefficients ψm,\psi^*_{m,\ell} are converted to original-unit coefficients ψm,\psi_{m,\ell} at runtime using the seasonal standard deviations from inflow_seasonal_stats.parquet:

ψm,=ψm,smsm\psi_{m,\ell} = \psi^*_{m,\ell} \cdot \frac{s_m}{s_{m-\ell}}

The residual standard deviation is also derived at this preprocessing step:

σm=smresidual_std_ratiom\sigma_m = s_m \cdot \texttt{residual\_std\_ratio}_m

These conversions are performed once at LP construction time. They require only the seasonal stats (sms_m) and the stored model quantities (ψm,\psi^*_{m,\ell}, residual_std_ratiom\texttt{residual\_std\_ratio}_m) — no autocorrelation values, no historical data.

Multiplying both sides of the canonical form (section 7.1) by σm(t)\sigma_{m(t)} and rearranging yields the LP-ready equation:

ah,t==1pψm(t),ah,t+[μm(t)=1pψm(t),μm(t)]+σm(t)εta_{h,t} = \sum_{\ell=1}^{p} \psi_{m(t),\ell} \cdot a_{h,t-\ell} + \left[ \mu_{m(t)} - \sum_{\ell=1}^{p} \psi_{m(t),\ell} \cdot \mu_{m(t-\ell)} \right] + \sigma_{m(t)} \cdot \varepsilon_t

where ψm(t),\psi_{m(t),\ell} and σm(t)\sigma_{m(t)} are derived from stored quantities as described in section 7.2.

This decomposes the inflow into three additive components:

  1. Lag contribution: =1pψm(t),ah,t\displaystyle\sum_{\ell=1}^{p} \psi_{m(t),\ell} \cdot a_{h,t-\ell} — linear function of past inflows (state variables or known values)
  2. Deterministic base: μm(t)=1pψm(t),μm(t)\displaystyle\mu_{m(t)} - \sum_{\ell=1}^{p} \psi_{m(t),\ell} \cdot \mu_{m(t-\ell)} — constant offset per (stage, hydro), precomputed once
  3. Stochastic innovation: σm(t)εt\sigma_{m(t)} \cdot \varepsilon_t — noise draw scaled by the seasonal residual standard deviation

The deterministic base is defined as:

bh,m(t)=μm(t)=1pψm(t),μm(t)b_{h,m(t)} = \mu_{m(t)} - \sum_{\ell=1}^{p} \psi_{m(t),\ell} \cdot \mu_{m(t-\ell)}

This is a precomputed constant per (stage, hydro) pair. It absorbs the mean-adjustment arithmetic that would otherwise be repeated at every forward-pass stage transition. With this definition, the LP-ready form (section 7.3) simplifies to:

ah,t==1pψm(t),ah,t+bh,m(t)+σm(t)εta_{h,t} = \sum_{\ell=1}^{p} \psi_{m(t),\ell} \cdot a_{h,t-\ell} + b_{h,m(t)} + \sigma_{m(t)} \cdot \varepsilon_t

For partial-year studies, the lag-season means μm(t)\mu_{m(t-\ell)} for seasons preceding the study start are sourced from the pre-study lag window (section 5.8); when no such statistic exists for a given lag season, that lag’s mean contribution is treated as zero.

The lagged inflows ah,ta_{h,t-\ell} are LP variables, not substituted values. In the LP (see LP Formulation), they appear with coefficients ψm(t),-\psi_{m(t),\ell} in the AR dynamics constraint row, and separate equality constraints fix each lag variable to its incoming state value:

ah,t=a^h,ta_{h,t-\ell} = \hat{a}_{h,t-\ell}

where a^h,t\hat{a}_{h,t-\ell} is patched per scenario to the actual lagged inflow from the trajectory record.

Because the lag contribution ψah,t\sum_\ell \psi \cdot a_{h,t-\ell} is carried by the constraint matrix (not the RHS), the AR dynamics constraint RHS reduces to:

RHSh,t=bh,m(t)+σm(t)εt\text{RHS}_{h,t} = b_{h,m(t)} + \sigma_{m(t)} \cdot \varepsilon_t

where:

  • bh,m(t)b_{h,m(t)} is the deterministic base for (stage, hydro), precomputed once at LP construction (section 7.4)
  • σm(t)\sigma_{m(t)} is the noise scale for (stage, hydro), derived from stored ratio at initialization (section 7.2)
  • εt\varepsilon_t is the scenario noise draw for this (stage, hydro)

The ψm(t),\psi_{m(t),\ell} coefficients are written into the constraint matrix once at LP construction time as the coefficients on the lagged inflow variables; they are not recomputed per scenario.

No division, no mean subtraction, no repeated coefficient transformation — the three precomputed LP components eliminate all redundant arithmetic from the hot path.

ComponentSymbolShape per stageLP RoleSource
Lag coefficientsψm(t),\psi_{m(t),\ell}One per (hydro, lag)Constraint matrix (AR dynamics row)Derived from stored ψ\psi^* and sms_m at initialization (section 7.2)
Deterministic basebh,m(t)b_{h,m(t)}One per hydroAR dynamics constraint RHS (fixed term)Precomputed from μ\mu and ψ\psi
Noise scaleσm(t)\sigma_{m(t)}One per hydroAR dynamics constraint RHS (noise factor)Derived from stored ratio and sms_m at initialization (section 7.2)

The PAR(p) fitting procedure (section 5) produces per-hydro noise terms εt\varepsilon_t that are treated as independent across hydro plants. Generating spatially correlated scenarios requires factorising the cross-hydro correlation matrix CC so that a vector of independent standard normal draws can be mapped to correlated noise. This section documents the choice of factorisation method and the rationale.

The classical approach applies Cholesky factorisation: given C=LLC = L L^\top with LL lower-triangular, correlated noise is obtained as LzL z where zN(0,I)z \sim \mathcal{N}(0, I). Cholesky requires CC to be strictly positive-definite. In practice, estimated correlation matrices from hydro inflow series are frequently near-singular or rank-deficient for two reasons:

  • Short sample records: Brazilian hydro series commonly span 80–90 years, yielding a historical record length NN that is comparable to the number of hydro plants in some subsystems. When NN is close to the matrix dimension, the sample eigenvalues of CC cluster near zero.
  • Heterogeneous series: Plants with near-identical hydrological regimes (upstream–downstream pairs, same river basin) produce columns that are nearly linearly dependent, reducing the effective rank of CC below its nominal dimension.

A near-singular CC causes Cholesky to fail or to produce numerically degenerate lower triangular factors. A separate filtering pass to remove “degenerate” hydros would be required before the factorisation, discarding information and introducing a non-transparent pre-processing decision.

8.2 Eigendecomposition with Clipped Square Root

Section titled “8.2 Eigendecomposition with Clipped Square Root”

Cobre uses the symmetric matrix square root via eigendecomposition. The correlation matrix is decomposed as:

C=UΛUC = U \Lambda U^\top

where UU is the orthogonal matrix of eigenvectors and Λ=diag(λ1,,λn)\Lambda = \mathrm{diag}(\lambda_1, \ldots, \lambda_n) is the diagonal matrix of eigenvalues. The symmetric square root is then:

C1/2=UΛ1/2UC^{1/2} = U \Lambda^{1/2} U^\top

To handle near-singular matrices, any eigenvalue λi<0\lambda_i < 0 (arising from floating-point rounding in the sample estimate) is clipped to zero before taking the square root:

Λ~1/2=diag ⁣(max(λ1,0),,max(λn,0))\tilde{\Lambda}^{1/2} = \mathrm{diag}\!\left(\sqrt{\max(\lambda_1, 0)},\, \ldots,\, \sqrt{\max(\lambda_n, 0)}\right)

Correlated noise is then generated as C1/2zC^{1/2} z where zN(0,I)z \sim \mathcal{N}(0, I).

The spectral form handles rank-deficient correlation matrices natively: eigenvectors corresponding to clipped (zero) eigenvalues contribute nothing to the factorisation, which is the correct behaviour for directions of zero variance. No prior filtering of degenerate hydro plants is needed.

The clipping threshold acts as a single, transparent parameter controlling which near-zero eigenvalues are treated as structural zeros. The cross-entity correlation structure is preserved for all eigenvalues above the threshold.

PropertyEigendecomposition (Cobre)Cholesky
Handles rank-deficient CCYes — clipping makes it robustNo — requires positive-definiteness
Computational costHigher (full eigendecomposition)Lower on well-conditioned matrices
Degenerate-hydro filtering passNot requiredRequired for near-singular CC
Transparency of approximationSingle clipping thresholdOpaque numerical failure or pivot

The higher computational cost is acceptable because the factorisation is performed once per study configuration and not on the hot path of the forward pass.

The classical PAR(p) of section 1 captures temporal dependence at lags up to a small pp (typically 4\leq 4 for monthly cycles, since the periodic Yule-Walker system becomes ill-conditioned at higher orders). On long Brazilian hydro series this is enough to reproduce the within-year persistence but not the multi-year persistence visible in dry/wet super-periods of the historical record. The PAR(p)-A extension adds a single annual coefficient on top of the periodic AR structure to capture that longer-range persistence without inflating the AR order.

The extension is selected by the order-selection method pacf_annual. When active, the model carries one additional triple per (hydro, season) on top of the classical parameter set.

Let Ah,t1A_{h,t-1} denote the rolling 12-month average of incremental inflows ending one stage before tt:

Ah,t1=112j=112ah,tjA_{h,t-1} = \frac{1}{12} \sum_{j=1}^{12} a_{h,\, t-j}

The PAR(p)-A model augments section 1 with the standardised deviation of Ah,t1A_{h,t-1} from its own seasonal mean:

ah,t  =  μm(t)  +  =1pψm(t),(ah,tμm(t))  +  ψ^m(t)(Ah,t1μm(t)1A)  +  σm(t)εta_{h,t} \;=\; \mu_{m(t)} \;+\; \sum_{\ell=1}^{p} \psi_{m(t),\ell}\,(a_{h,t-\ell} - \mu_{m(t-\ell)}) \;+\; \hat{\psi}_{m(t)}\,(A_{h,t-1} - \mu^A_{m(t)-1}) \;+\; \sigma_{m(t)} \cdot \varepsilon_t

where:

  • μmA\mu^A_{m}, σmA\sigma^A_{m}: seasonal sample mean and population-divisor standard deviation of Ah,A_{h, \cdot} at season mm
  • ψ^m(t)\hat{\psi}_{m(t)}: original-unit annual coefficient at season m(t)m(t) — derived at runtime from the standardised stored coefficient (section 9.4)
  • All other symbols carry their classical meaning from section 1

When the PAR(p)-A extension is inactive, the annual term is absent and the model reduces exactly to section 1.

For each (hydro, season) the PAR(p)-A path stores three additional quantities:

QuantitySymbolDescription
Standardised annual coefficientψ\psiYule-Walker output for the annual term — dimensionless
Annual seasonal meanμmA\mu^A_mSample mean of Ah,A_{h, \cdot} at season mm (m³/s)
Annual seasonal stdσmA\sigma^A_mPopulation-divisor std of Ah,A_{h, \cdot} at season mm (m³/s, >0> 0)

The standardised coefficient ψ\psi is the direct output of the extended periodic Yule-Walker system below (section 9.5). Storage of μmA\mu^A_m and σmA\sigma^A_m alongside the seasonal statistics of ah,a_{h, \cdot} enables the runtime unit conversion of section 9.4 without re-reading the historical record.

9.3 Estimating μmA\mu^A_m and σmA\sigma^A_m

Section titled “9.3 Estimating μmA\mu^A_mμmA​ and σmA\sigma^A_mσmA​”

Group rolling-window values At=112j=011ah,t11+jA_t = \frac{1}{12} \sum_{j=0}^{11} a_{h,\, t - 11 + j} by the season of their PDF time-index t1t-1 (the most recent observation contributing to the window). For each (hydro, season) bucket of values {A(i)}\{A^{(i)}\}:

μ^mA  =  1NmAiA(i)σ^mA  =  1NmAi(A(i)μ^mA)2\hat{\mu}^A_m \;=\; \frac{1}{N^A_m} \sum_{i} A^{(i)} \qquad \hat{\sigma}^A_m \;=\; \sqrt{\frac{1}{N^A_m} \sum_{i} \bigl(A^{(i)} - \hat{\mu}^A_m\bigr)^2}

Both estimators use the population divisor 1/NmA1/N^A_m, matching the convention of section 5.2 and ensuring no sample-vs-population scale factor leaks into the conditional FACP of section 9.5. At least 13 chronological observations are required for a hydro to participate in PAR(p)-A — that is the minimum needed to form one rolling 12-month average.

The stored standardised coefficient ψ\psi is converted to the original-unit coefficient ψ^\hat{\psi} at LP construction time using the seasonal stats and annual stats:

ψ^m  =  ψsmσmA\hat{\psi}_{m} \;=\; \psi \cdot \frac{s_m}{\sigma^A_m}

The conversion mirrors section 7.2 for the classical AR coefficients. The runtime annual term entering the LP RHS is then ψ^m(t)(Ah,t1μm(t)1A)\hat{\psi}_{m(t)} \cdot \bigl(A_{h,t-1} - \mu^A_{m(t)-1}\bigr), where the lagged rolling-window value Ah,t1A_{h,t-1} is itself derived from the lag state variables already carried by the LP.

9.5 Order Selection and Coefficient Estimation

Section titled “9.5 Order Selection and Coefficient Estimation”

PAR(p)-A order selection conditions on the annual noise series. The order-selection input is the conditional FACP at lag kk, defined as the partial autocorrelation between the standardised current-season residual and the standardised residual at lag kk, conditioned on the intermediate standardised annual noise series ZZ and the previous annual innovation At1A_{t-1}. Computing the conditional FACP requires a partitioned covariance decomposition that distinguishes ZZZ \otimes Z, ZAZ \otimes A, and AZ1A \otimes Z_{-1} blocks.

The conditional FACP feeds the PACF order-selection rule of section 4.1, with the two PAR(p)-A-specific extensions (structural-zero short-circuit and minimum-order-1) already described there.

The ZZZ \otimes Z block uses the same year-aligned population divisor as the classical autocovariance (section 5.3). The ZAZ \otimes A and AZ1A \otimes Z_{-1} blocks use a max-bucket-size divisor:

γ^ZA()  =  1max(A,Z)i(Z(i)Zˉ)(A(i)Aˉ)\hat{\gamma}_{Z \otimes A}(\ell) \;=\; \frac{1}{\max(|A|,\, |Z|)} \sum_i \bigl(Z^{(i)} - \bar{Z}\bigr)\bigl(A^{(i)} - \bar{A}\bigr)

The max-bucket convention is required because AA excludes the first year of ZZ by construction (a rolling 12-month window cannot anchor in the first 12 observations). The strict-pair count would distort the scale of the cross-correlations and bias the conditional FACP. The PAR(p) path never uses Z⊗A cross-correlations, so the divisor question is PAR(p)-A-specific.

Once the order pp is selected, the coefficients (ψm,1,,ψm,p,ψ)(\psi^*_{m,1}, \ldots, \psi^*_{m,p}, \psi) are recovered by solving the extended periodic Yule-Walker system:

Rmext(ψm,1ψm,pψ)=rmext\mathbf{R}^{\,\text{ext}}_m \begin{pmatrix} \psi^*_{m,1} \\ \vdots \\ \psi^*_{m,p} \\ \psi \end{pmatrix} = \boldsymbol{r}^{\,\text{ext}}_m

where Rmext\mathbf{R}^{\,\text{ext}}_m is the (p+1)×(p+1)(p+1) \times (p+1) partitioned covariance whose first pp rows replicate the classical periodic Yule-Walker rows (section 5.4) and whose last row adds the ZAZ \otimes A and AAA \otimes A entries. The RHS rmext\boldsymbol{r}^{\,\text{ext}}_m appends the AZ1A \otimes Z_{-1} target. The residual std ratio is recovered from the inner product of the solution and the RHS, exactly as in the classical case.

The Maceira-Damazio iterative reduction of section 4.1 is applied across the full periodic cycle on the PAR(p)-A path as well: after the initial fit, the recursively-composed contributions of each AR lag through the periodic monthly chain are evaluated, and any negative-contribution season has its AR ceiling reduced before refit. The annual coefficient ψ\psi does not enter the contribution-chain check — it is anchored to the rolling annual mean and so does not propagate through the lag chain.

ConfigurationPath
order_selection: "pacf"Classical PAR(p) — annual triple absent (section 5)
order_selection: "pacf_annual"PAR(p)-A — annual triple required for every (hydro, season)
Bucket flagged HistoryClass::Constant or Saturated (section 5.7)Effective order 0 on either path; annual term suppressed when the seasonal std collapses
Hydro with fewer than 13 observations on PAR(p)-A pathHard failure during fitting (no silent fallback to classical)

The two paths share the seasonal-stats producer of section 5.2; switching between them does not silently change μ^m\hat{\mu}_m or s^m\hat{s}_m. The PAR(p)-A path uses the same spatial-correlation factorisation as the classical path (section 8); the extension affects only the temporal model.

  • LP Formulation — AR inflow dynamics in the LP: state expansion, lag column pinning, reduced-cost extraction
  • Inflow Non-Negativity — Methods for handling negative realizations produced by the PAR(p) model
  • Scenario Generation — When external scenarios are used in training, a PAR model is fitted to the external data for backward pass opening tree generation. The fitting procedure (section 5 above) applies equally to this derived model.
  • Notation Conventions — Defines inflow symbols (ah,ta_{h,t}, μm\mu_m, ψm,\psi_{m,\ell}, σm\sigma_m) and unit conventions