PAR(p) Autoregressive Models

What Is a PAR(p) Model?

A Periodic Autoregressive model of order p (PAR(p)) is a time series model designed for data with strong seasonal patterns. It extends the classical autoregressive (AR) model by allowing every parameter to vary by season — the coefficients that govern January inflows are different from those that govern July inflows.

The “order p” indicates how many past time steps the model looks back. A PAR(3) model for a given month predicts the current inflow using the inflows from the previous three months. The order can differ by season: January might need only one lag while April might need four, reflecting different hydrological dynamics across the year.

The PAR(p) Equation

For hydro plant $h$ at stage $t$ falling in season $m (t)$ , the PAR(p) model is:

$a_{h, t} = μ_{m (t)} + ℓ = 1 \sum p ψ_{m (t), ℓ} (a_{h, t - ℓ} - μ_{m (t - ℓ)}) + σ_{m (t)} \cdot ε_{t}$

In words: the inflow at stage $t$ equals the seasonal mean, plus a weighted combination of how much recent inflows deviated from their seasonal means, plus a random shock.

Parameters by Season

Each season $m$ has its own set of parameters:

Parameter	Symbol	Role
Seasonal mean	$μ_{m}$	Expected inflow for season $m$
AR coefficients	$ψ_{m, 1}, \dots, ψ_{m, p}$	Weights on past deviations from the mean
Residual std	$σ_{m}$	Scale of the random innovation
Innovation	$ε_{t} \sim N (0, 1)$	Standardized random shock

The seasonal mean $μ_{m}$ and sample standard deviation $s_{m}$ are estimated from historical data. The AR coefficients $ψ_{m, ℓ}$ are fitted using the Yule-Walker equations (see below). The residual standard deviation $σ_{m}$ is derived at runtime from the other parameters (it is not stored independently).

How Lags Become State Variables

In the SDDP framework, decisions at each stage depend on a set of state variables that summarize everything the optimizer needs to know from the past. For the PAR(p) model, the state variables are the lagged inflows:

$State at stage t : (v_{h, t}, a_{h, t - 1}, a_{h, t - 2}, \dots, a_{h, t - p_{m a x}})$

where $v_{h, t}$ is the reservoir volume and $a_{h, t - ℓ}$ are the lagged inflows needed by the autoregressive equation. Each lag adds one state variable per hydro plant to the SDDP subproblem.

This is significant for problem size: a system with 150 hydro plants and a maximum PAR order of 6 adds up to $150 \times 6 = 900$ state variables beyond the reservoir volumes. The LP formulation includes constraints that “shift” lagged inflows forward from one stage to the next, ensuring the autoregressive structure is respected across the Bellman recursion.

Stored vs. Computed Quantities

Cobre stores the natural outputs of the fitting process:

Stored: seasonal means ( $μ_{m}$ ), seasonal sample standard deviations ( $s_{m}$ ), AR order ( $p_{m}$ ), standardized AR coefficients ( $ψ_{m, ℓ}^{*}$ , the direct Yule-Walker output), and residual_std_ratio (DEC-020)
Computed at runtime: original-unit AR coefficients ( $ψ_{m, ℓ}$ ) and the residual standard deviation $σ_{m}$ , derived from the stored standardized quantities and conditioning stats

This design separates the swappable seasonal conditioning from the fixed model dynamics (DEC-020).

Yule-Walker Fitting Procedure

When fitting PAR(p) parameters from historical inflow data, the AR coefficients are estimated by solving the Yule-Walker equations — a linear system that relates the autocorrelations of the data to the model coefficients. The procedure has six steps.

Implementation status: As of v0.1.1, this full fitting procedure is implemented in cobre-stochastic’s estimation module. Steps 1–5 are carried out by the seasonal statistics, autocorrelation, and AR coefficient estimators; Step 6 selects the model order via partial autocorrelation function (PACF) significance testing before the final coefficients are computed.

Step 1 — Seasonal Statistics

For each season $m$ , compute the sample mean and standard deviation from historical observations ${a_{h, t} : m (t) = m}$ :

$\overset{μ}{^}_{m} = \frac{1}{N _{m}} t : m (t) = m \sum a_{h, t}$

$\overset{s}{^}_{m} = \frac{1}{N _{m} - 1} t : m (t) = m \sum (a_{h, t} - \overset{μ}{^}_{m})^{2}$

where $N_{m}$ is the number of historical observations for season $m$ .

Step 2 — Seasonal Autocorrelations

Compute the cross-seasonal autocorrelation at lag $ℓ$ for season $m$ . The cross-seasonal structure arises because lag $ℓ$ at season $m$ reaches back to season $m - ℓ$ (cyclically):

$\overset{γ}{^}_{m} (ℓ) = \frac{1}{N _{m} - 1} t : m (t) = m \sum (a_{h, t} - \overset{μ}{^}_{m}) (a_{h, t - ℓ} - \overset{μ}{^}_{m - ℓ})$

$\overset{ρ}{^}_{m} (ℓ) = \frac{γ ^ _{m} ( ℓ )}{s ^ _{m} \cdot s ^ _{m - ℓ}}$

Note that $\overset{s}{^}_{m - ℓ}$ is the standard deviation of season $m - ℓ$ , not of season $m$ . This is the defining feature of a periodic (as opposed to stationary) autoregressive model.

Step 3 — Yule-Walker System

For each season $m$ , the coefficients in standardized form $ψ_{m, 1}^{*}, \dots, ψ_{m, p}^{*}$ satisfy:

$R_{m} ψ_{m}^{*} = r_{m}$

where:

$R_{m} = 1 \overset{ρ}{^}_{m} (1) ⋮ \overset{ρ}{^}_{m} (p - 1) \overset{ρ}{^}_{m - 1} (1) 1 ⋮ \overset{ρ}{^}_{m - 1} (p - 2) \dots \dots ⋱ \dots \overset{ρ}{^}_{m - p + 1} (p - 1) \overset{ρ}{^}_{m - p + 2} (p - 2) ⋮ 1, r_{m} = \overset{ρ}{^}_{m} (1) \overset{ρ}{^}_{m} (2) ⋮ \overset{ρ}{^}_{m} (p)$

The solution is:

$\hat{ψ}_{m}^{*} = R_{m}^{- 1} r_{m}$

The matrix $R_{m}$ is not a standard Toeplitz matrix (because consecutive rows use different seasons’ correlations), but it has a similar structure. The correlation matrix must be positive definite for the solution to exist; if not, the historical record may be too short for the requested order.

LU factorization: In the implementation, the periodic Yule-Walker system is solved via LU factorization with partial pivoting rather than direct matrix inversion or the classical Levinson-Durbin recursion. The Levinson-Durbin recursion assumes a stationary Toeplitz covariance structure, which does not hold for the periodic correlation matrix $R_{m}$ (whose consecutive rows use different seasons’ correlations). LU factorization with partial pivoting handles the general (non-Toeplitz) case correctly in $O (p^{3})$ time. For the per-season orders typical in hydro studies ( $p \leq 12$ ), this cost is negligible.

Step 4 — Residual Standard Deviation

After solving the Yule-Walker system, the residual standard deviation for season $m$ is:

$\overset{σ}{^}_{m} = \overset{s}{^}_{m} 1 - ψ_{m}^{* ⊤} r_{m}$

This equals $\overset{s}{^}_{m}$ times the square root of the unexplained variance fraction. If $\overset{σ}{^}_{m}^{2} \leq 0$ , the model overfits — it explains all historical variance, leaving no room for the noise term.

Step 5 — Convert to Original Units

The Yule-Walker solution yields coefficients in standardized form $ψ_{m, ℓ}^{*}$ (dimensionless, relating standardized deviations). The LP requires original-unit coefficients:

$ψ_{m, ℓ} = ψ_{m, ℓ}^{*} \cdot \frac{s _{m}}{s _{m - ℓ}}$

These are computed once at initialization and used directly as LP constraint matrix entries.

Step 6 — Model Order Selection (PACF)

Before Steps 3–5 are applied at the final model order, the implementation selects the order $p_{m}$ for each season $m$ using partial autocorrelation function (PACF) significance testing. The procedure fits Yule-Walker systems at increasing orders $p = 1, 2, \dots, p_{m a x}$ and examines the last coefficient $ψ_{m, p}^{*}$ at each order — the partial autocorrelation at lag $p$ . Under the null hypothesis that the true order is less than $p$ , the partial autocorrelation is approximately normally distributed with standard error $1/ N$ , where $N$ is the number of historical observations for the season.

The selected order is the largest $p$ whose partial autocorrelation is significant:

$p_{m}^{*} = max {p \in {1, \dots, p_{m a x}} : ψ_{m, p}^{*} > z_{α /2} / N}$

where $z_{α /2}$ is the critical value for the chosen significance level (typically $z_{0.025} = 1.96$ for a 95% confidence band). If no lag is significant, the selected order is $p_{m}^{*} = 0$ (white-noise model, no autoregressive structure).

Implementation: This procedure is implemented in select_order_pacf in the cobre-stochastic estimation module. The function evaluates PACF significance for each candidate order and returns the selected order. AIC and BIC are recognized alternatives but are not implemented.

Key Properties

Periodicity: All parameters vary by season, matching the strong seasonality of hydrological data.
Parsimony: The model order $p$ is selected per season using PACF significance testing (implemented via select_order_pacf). AIC and BIC are recognized alternatives but are not implemented.
Stationarity: Fitted models are validated to ensure the AR process does not diverge — the characteristic polynomial roots must lie outside the unit circle.
Positive residual variance: After fitting, $σ_{m}^{2} > 0$ must hold for all seasons. A zero or negative residual variance indicates overfitting.

Cobre Methodology Reference