Parametric Modeling and System Identification

ARX Model Structure
- System identification with noise.
- Data: ${(u(t), y(t))}$
- ARX model:
 - $A(q^{-1})y(t) = B(q^{-1})u(t) + e(t)$
 - $[1 + a1q^{-1} + … + a{na}q^{-na}]y(t) = [b0 + b1q^{-1} + … + b_{nb}q^{-nb}]u(t) + e(t)$
 - ${q^{-1}}$ is the delay operator.
 - $y(t)$ is the output at time $t$ .
 - $u(t)$ is the input at time $t$ .
 - $e(t)$ is the noise at time $t$ .
 - $a_i$ are the coefficients of the polynomial $A$ .
 - $b_i$ are the coefficients of the polynomial $B$ .
- Regression form:
 - $y(t) = \phi^T(t) \theta + e(t)$
 - $\phi(t)$ is the regressor vector (past inputs and outputs).
 - $\theta$ is the parameter vector.
- Regressors: $\phi(t)$
- Parameters: $\theta$
- $A, B \implies \theta$

Equivalent Form
- $y(t; \theta) = \phi^T(t) \theta$
- $\theta = \begin{bmatrix} a1 \ … \ a{na} \ b0 \ … \ b{nb} \end{bmatrix}$
- $\phi(t) = \begin{bmatrix} -y(t-1) \ … \ -y(t-na) \ u(t) \ … \ u(t-nb) \end{bmatrix}$
Multiple Linear Regression
- $J(\theta) = \frac{1}{N} \sum_{t=0}^{N-1} \epsilon^2(t; \theta)$
- One-step ahead prediction error: $\epsilon(t; \theta) = y(t) - y(t; \theta)$
- Prediction error.
- Mean-squared error (MSE) cost function.

Expressing the cost function:
- $J(\theta) = \frac{1}{N} \sum_{t=0}^{N-1} \epsilon^2(t; \theta)$
Taking the derivative to find the minimum:
- $\hat{\theta} = argmin J(\theta) \implies \sum_{t=0}^{N-1} \phi(t) \epsilon(t; \theta) = 0$
This leads to a linear system of equations.
Solution:
- $\hat{\theta}_N = R^{-1}(N) f(N)$
- $R(N) = \sum_{t=0}^{N-1} \phi(t) \phi^T(t)$
- $f(N) = \sum_{t=0}^{N-1} \phi(t) y(t)$
If the data is "rich enough", $R(N)$ will be full rank (invertible).
- $R(N)$ is a $p \times p$ matrix.
- $f(N)$ is a $p \times 1$ matrix.

LS Estimate
- $\hat{\theta}_N = R^{-1}(N) f(N)$
Consistency: Is OLS a "good" estimate?
Assumption: The data-generating process is an ARX model.
- $y(t) = \phi^T(t) \theta_0 + e(t)$
- $\theta_0$ represents the ground-truth parameters.
Ideally, we want $\hat{\theta}N = \theta0$

Substituting the Data Generating Process
- $\hat{\theta}{LS} = R^{-1}(N) \sum{t=0}^{N-1} \phi(t) y(t)$
- $\hat{\theta}{LS} = R^{-1}(N) \sum{t=0}^{N-1} \phi(t) [\phi^T(t) \theta_0 + e(t)]$
- $\hat{\theta}{LS} = \theta0 + R^{-1}(N) \sum_{t=0}^{N-1} \phi(t) e(t)$
Estimation Error
- $\tilde{\theta}N = \hat{\theta}N - \theta0 = R^{-1}(N) fe(N)$
- $fe(N) = \sum{t=0}^{N-1} \phi(t) e(t)$

Definition of Consistency
- An estimate $\hat{\theta}$ is consistent if $\lim{N \to \infty} \hat{\theta}N = \theta_0$
Question: Is OLS consistent?
Analysis
- $\lim{N \to \infty} R^{-1}(N) fe(N) = 0$
- $\lim{N \to \infty} R(N) = \lim{N \to \infty} \frac{1}{N} \sum_{t=0}^{N-1} \phi(t) \phi^T(t)$
- $\lim{N \to \infty} \frac{1}{N} \sum{t=0}^{N-1} \begin{bmatrix} -y(t-1) \ … \ -y(t-na) \ u(t) \ … \ u(t-nb) \end{bmatrix} \begin{bmatrix} -y(t-1) & … & -y(t-na) & u(t) & … & u(t-nb) \end{bmatrix}$
Covariance Matrices
- $Ry(\tau) = \lim{N \to \infty} \frac{1}{N} \sum_{t=0}^{N-1} y(t+\tau) y(t)$

$\lim{N \to \infty} \frac{R(N)}{N} = \begin{bmatrix} Ry(0) & Ry(1) & … & Ry(na) & R{yu}(0) & … & R{yu}(nb) \ Ry(1) & Ry(0) & … & Ry(na-1) & R{yu}(1) & … & R{yu}(nb) \ … & … & … & … & … & … & … \ Ry(na) & Ry(na-1) & … & Ry(0) & R{yu}(na) & … & R{yu}(nb) \ R{uy}(0) & R{uy}(1) & … & R{uy}(na) & Ru(0) & … & Ru(nb) \ … & … & … & … & … & … & … \ R{uy}(nb) & R{uy}(nb-1) & … & R{uy}(na) & Ru(nb) & … & Ru(0) \end{bmatrix}$
$\lim{N \to \infty} \frac{fe(N)}{N} = \lim{N \to \infty} \frac{1}{N} \sum{t=0}^{N-1} \begin{bmatrix} -y(t-1) \ … \ -y(t-na) \ u(t) \ … \ u(t-nb) \end{bmatrix} e(t) = \begin{bmatrix} -R{ye}(1) \ … \ -R{ye}(na) \ R{ue}(0) \ … \ R{ue}(nb) \end{bmatrix} = 0$
Since $e(t)$ is assumed to be white noise.
Any open-loop experiment.

Regularization
- $J(\theta) = MSE(\theta) + \lambda ||\theta||^2$
- Adds bias but controls variance when $N \approx p$
Prediction
- $\hat{y}(t|t-1) = E[y(t) | \Omega(t-1)]$
- $\epsilon(t) = y(t) - \hat{y}(t|t-1)$
- $\Omega(t-1)$ is the information set up to time $t-1$ .
- $\epsilon(t)$ is the "new" information in $y(t)$ that did not exist in $\Omega(t-1)$ .
- If the model learned well, $\epsilon(t)$ should be uncorrelated with $\Omega(t-1)$ .

General Principle
- Good Model = Good Prediction
- Minimize prediction error.
- $VN(\theta) = \frac{1}{N} \sum{t=1}^{N} \epsilon^2(t, \theta)$
For ARX Models
- Prediction error should be uncorrelated with past data.
- $\sum_{t=1}^{N} \epsilon(t) \phi(t) = 0$

Question: What happens if we fit an ARX model to data coming from a non-ARX system?
Scenario: True system is not ARX.
- $y(t) = \phi^T(t) \theta_0 + v(t)$
- $v(t)$ is colored noise.
Model: ARX model.
- $y(t; \theta) = \phi^T(t) \theta + e(t)$
Model Mismatch
- $A0 y(t) = B0 u(t) + C_0 e(t)$
- $A y(t) = B u(t) + e(t)$
- We can always re-arrange into an ARX-looking form, except that the noise won't be white.

LS Estimate
- $\hat{\theta}_{LS} = R^{-1}(N) f(N)$
- $\hat{\theta}{LS} = \theta0 + R^{-1}(N) f_v(N)$
$\lim{N \to \infty} R^{-1}(N) fv(N) \neq 0$
- Not generally zero.
- Bias if $v$ is colored.
- $\hat{\theta}$ won't be consistent if the experiment is open-loop.

System and Model
- Assume System: $y(t) = \phi^T(t) \theta_0 + v(t)$
- Model: $\hat{y}(t; \theta) = \phi^T(t) \theta + e(t)$
Instead of minimizing MSE
- $\sum_{t=0}^{N-1} \zeta(t) \epsilon(t; \theta) = 0$
- $\zeta(t)$ is the instrumental variable.
- $\zeta(t) e(t; \theta) = 0$
For now, assume $\zeta(t)$ is given
I.V.

Cost Function
- $J(\theta) = \frac{1}{N} \sum_{t=0}^{N-1} \zeta(t) [y(t) - \phi^T(t) \theta]$
$\hat{\theta}{IV} = \Bigg[\sum{t=0}^{N-1} \zeta(t) \phi(t)^T \Bigg]^{-1} \Bigg[\sum_{t=0}^{N-1} \zeta(t) y(t) \Bigg]$
Asymptotic Analysis
- $\lim{N \to \infty} \frac{1}{N} \sum{t=0}^{N-1} \zeta(t) v(t) = 0$
- $\lim{N \to \infty} \hat{\theta}{IV} = \theta0 + \Bigg[ R{\zeta \phi} \Bigg]^{-1} \Bigg[ R_{\zeta v} \Bigg]$

$\lim{N \to \infty} \hat{\theta}{IV} = \theta0 + \Bigg[ R{\zeta \phi} \Bigg]^{-1} \Bigg[ R_{\zeta v} \Bigg]$
We want:
- $R_{\zeta v} = 0$
- $\zeta$ should be uncorrelated with $v$ .
- $R_{\zeta \phi}$
- Should be invertible and even well-conditioned.
- $\zeta$ should be correlated with $\phi$ .