Parametric Modeling and System Identification
Parametric Modeling
- ARX Model Structure
- System identification with noise.
- Data: {(u(t), y(t))}
- ARX model:
- A(q^{-1})y(t) = B(q^{-1})u(t) + e(t)
- [1 + a1q^{-1} + … + a{na}q^{-na}]y(t) = [b0 + b1q^{-1} + … + b_{nb}q^{-nb}]u(t) + e(t)
- {q^{-1}} is the delay operator.
- y(t) is the output at time t.
- u(t) is the input at time t.
- e(t) is the noise at time t.
- a_i are the coefficients of the polynomial A.
- b_i are the coefficients of the polynomial B.
- Regression form:
- y(t) = \phi^T(t) \theta + e(t)
- \phi(t) is the regressor vector (past inputs and outputs).
- \theta is the parameter vector.
- Regressors: \phi(t)
- Parameters: \theta
- A, B \implies \theta
ARX Representation
- Equivalent Form
- y(t; \theta) = \phi^T(t) \theta
- \theta = \begin{bmatrix} a1 \ … \ a{na} \ b0 \ … \ b{nb} \end{bmatrix}
- \phi(t) = \begin{bmatrix} -y(t-1) \ … \ -y(t-na) \ u(t) \ … \ u(t-nb) \end{bmatrix}
- Multiple Linear Regression
- J(\theta) = \frac{1}{N} \sum_{t=0}^{N-1} \epsilon^2(t; \theta)
- One-step ahead prediction error: \epsilon(t; \theta) = y(t) - y(t; \theta)
- Prediction error.
- Mean-squared error (MSE) cost function.
Cost Function and Linear System
- Expressing the cost function:
- J(\theta) = \frac{1}{N} \sum_{t=0}^{N-1} \epsilon^2(t; \theta)
- Taking the derivative to find the minimum:
- \hat{\theta} = argmin J(\theta) \implies \sum_{t=0}^{N-1} \phi(t) \epsilon(t; \theta) = 0
- This leads to a linear system of equations.
- Solution:
- \hat{\theta}_N = R^{-1}(N) f(N)
- R(N) = \sum_{t=0}^{N-1} \phi(t) \phi^T(t)
- f(N) = \sum_{t=0}^{N-1} \phi(t) y(t)
- If the data is "rich enough", R(N) will be full rank (invertible).
- R(N) is a p \times p matrix.
- f(N) is a p \times 1 matrix.
Ordinary Least Squares (OLS) Estimation
- LS Estimate
- \hat{\theta}_N = R^{-1}(N) f(N)
- Consistency: Is OLS a "good" estimate?
- Assumption: The data-generating process is an ARX model.
- y(t) = \phi^T(t) \theta_0 + e(t)
- \theta_0 represents the ground-truth parameters.
- Ideally, we want \hat{\theta}N = \theta0
OLS and Estimation Error
- Substituting the Data Generating Process
- \hat{\theta}{LS} = R^{-1}(N) \sum{t=0}^{N-1} \phi(t) y(t)
- \hat{\theta}{LS} = R^{-1}(N) \sum{t=0}^{N-1} \phi(t) [\phi^T(t) \theta_0 + e(t)]
- \hat{\theta}{LS} = \theta0 + R^{-1}(N) \sum_{t=0}^{N-1} \phi(t) e(t)
- Estimation Error
- \tilde{\theta}N = \hat{\theta}N - \theta0 = R^{-1}(N) fe(N)
- fe(N) = \sum{t=0}^{N-1} \phi(t) e(t)
Consistency of OLS
- Definition of Consistency
- An estimate \hat{\theta} is consistent if \lim{N \to \infty} \hat{\theta}N = \theta_0
- Question: Is OLS consistent?
- Analysis
- \lim{N \to \infty} R^{-1}(N) fe(N) = 0
- \lim{N \to \infty} R(N) = \lim{N \to \infty} \frac{1}{N} \sum_{t=0}^{N-1} \phi(t) \phi^T(t)
- \lim{N \to \infty} \frac{1}{N} \sum{t=0}^{N-1} \begin{bmatrix} -y(t-1) \ … \ -y(t-na) \ u(t) \ … \ u(t-nb) \end{bmatrix} \begin{bmatrix} -y(t-1) & … & -y(t-na) & u(t) & … & u(t-nb) \end{bmatrix}
- Covariance Matrices
- Ry(\tau) = \lim{N \to \infty} \frac{1}{N} \sum_{t=0}^{N-1} y(t+\tau) y(t)
Covariance Matrix and Open-Loop Experiment
- \lim{N \to \infty} \frac{R(N)}{N} = \begin{bmatrix} Ry(0) & Ry(1) & … & Ry(na) & R{yu}(0) & … & R{yu}(nb) \ Ry(1) & Ry(0) & … & Ry(na-1) & R{yu}(1) & … & R{yu}(nb) \ … & … & … & … & … & … & … \ Ry(na) & Ry(na-1) & … & Ry(0) & R{yu}(na) & … & R{yu}(nb) \ R{uy}(0) & R{uy}(1) & … & R{uy}(na) & Ru(0) & … & Ru(nb) \ … & … & … & … & … & … & … \ R{uy}(nb) & R{uy}(nb-1) & … & R{uy}(na) & Ru(nb) & … & Ru(0) \end{bmatrix}
- \lim{N \to \infty} \frac{fe(N)}{N} = \lim{N \to \infty} \frac{1}{N} \sum{t=0}^{N-1} \begin{bmatrix} -y(t-1) \ … \ -y(t-na) \ u(t) \ … \ u(t-nb) \end{bmatrix} e(t) = \begin{bmatrix} -R{ye}(1) \ … \ -R{ye}(na) \ R{ue}(0) \ … \ R{ue}(nb) \end{bmatrix} = 0
- Since e(t) is assumed to be white noise.
- Any open-loop experiment.
Regularization and Prediction Error
- Regularization
- J(\theta) = MSE(\theta) + \lambda ||\theta||^2
- Adds bias but controls variance when N \approx p
- Prediction
- \hat{y}(t|t-1) = E[y(t) | \Omega(t-1)]
- \epsilon(t) = y(t) - \hat{y}(t|t-1)
- \Omega(t-1) is the information set up to time t-1.
- \epsilon(t) is the "new" information in y(t) that did not exist in \Omega(t-1).
- If the model learned well, \epsilon(t) should be uncorrelated with \Omega(t-1).
Prediction Error Framework
- General Principle
- Good Model = Good Prediction
- Minimize prediction error.
- VN(\theta) = \frac{1}{N} \sum{t=1}^{N} \epsilon^2(t, \theta)
- For ARX Models
- Prediction error should be uncorrelated with past data.
- \sum_{t=1}^{N} \epsilon(t) \phi(t) = 0
Model Mismatch
- Question: What happens if we fit an ARX model to data coming from a non-ARX system?
- Scenario: True system is not ARX.
- y(t) = \phi^T(t) \theta_0 + v(t)
- v(t) is colored noise.
- Model: ARX model.
- y(t; \theta) = \phi^T(t) \theta + e(t)
- Model Mismatch
- A0 y(t) = B0 u(t) + C_0 e(t)
- A y(t) = B u(t) + e(t)
- We can always re-arrange into an ARX-looking form, except that the noise won't be white.
Consequences of Model Mismatch
- LS Estimate
- \hat{\theta}_{LS} = R^{-1}(N) f(N)
- \hat{\theta}{LS} = \theta0 + R^{-1}(N) f_v(N)
- \lim{N \to \infty} R^{-1}(N) fv(N) \neq 0
- Not generally zero.
- Bias if v is colored.
- \hat{\theta} won't be consistent if the experiment is open-loop.
Instrumental Variable (IV) Method
- System and Model
- Assume System: y(t) = \phi^T(t) \theta_0 + v(t)
- Model: \hat{y}(t; \theta) = \phi^T(t) \theta + e(t)
- Instead of minimizing MSE
- \sum_{t=0}^{N-1} \zeta(t) \epsilon(t; \theta) = 0
- \zeta(t) is the instrumental variable.
- \zeta(t) e(t; \theta) = 0
- For now, assume \zeta(t) is given
- I.V.
IV Estimation
- Cost Function
- J(\theta) = \frac{1}{N} \sum_{t=0}^{N-1} \zeta(t) [y(t) - \phi^T(t) \theta]
- \hat{\theta}{IV} = \Bigg[\sum{t=0}^{N-1} \zeta(t) \phi(t)^T \Bigg]^{-1} \Bigg[\sum_{t=0}^{N-1} \zeta(t) y(t) \Bigg]
- Asymptotic Analysis
- \lim{N \to \infty} \frac{1}{N} \sum{t=0}^{N-1} \zeta(t) v(t) = 0
- \lim{N \to \infty} \hat{\theta}{IV} = \theta0 + \Bigg[ R{\zeta \phi} \Bigg]^{-1} \Bigg[ R_{\zeta v} \Bigg]
Designing the Instrumental Variable
- \lim{N \to \infty} \hat{\theta}{IV} = \theta0 + \Bigg[ R{\zeta \phi} \Bigg]^{-1} \Bigg[ R_{\zeta v} \Bigg]
- We want:
- R_{\zeta v} = 0
- \zeta should be uncorrelated with v.
- R_{\zeta \phi}
- Should be invertible and even well-conditioned.
- \zeta should be correlated with \phi.