Parametric Modeling and System Identification

Parametric Modeling

  • ARX Model Structure
    • System identification with noise.
    • Data: {(u(t), y(t))}
    • ARX model:
      • A(q^{-1})y(t) = B(q^{-1})u(t) + e(t)
      • [1 + a1q^{-1} + … + a{na}q^{-na}]y(t) = [b0 + b1q^{-1} + … + b_{nb}q^{-nb}]u(t) + e(t)
      • {q^{-1}} is the delay operator.
      • y(t) is the output at time t.
      • u(t) is the input at time t.
      • e(t) is the noise at time t.
      • a_i are the coefficients of the polynomial A.
      • b_i are the coefficients of the polynomial B.
    • Regression form:
      • y(t) = \phi^T(t) \theta + e(t)
      • \phi(t) is the regressor vector (past inputs and outputs).
      • \theta is the parameter vector.
    • Regressors: \phi(t)
    • Parameters: \theta
    • A, B \implies \theta

ARX Representation

  • Equivalent Form
    • y(t; \theta) = \phi^T(t) \theta
    • \theta = \begin{bmatrix} a1 \ … \ a{na} \ b0 \ … \ b{nb} \end{bmatrix}
    • \phi(t) = \begin{bmatrix} -y(t-1) \ … \ -y(t-na) \ u(t) \ … \ u(t-nb) \end{bmatrix}
  • Multiple Linear Regression
    • J(\theta) = \frac{1}{N} \sum_{t=0}^{N-1} \epsilon^2(t; \theta)
    • One-step ahead prediction error: \epsilon(t; \theta) = y(t) - y(t; \theta)
    • Prediction error.
    • Mean-squared error (MSE) cost function.

Cost Function and Linear System

  • Expressing the cost function:
    • J(\theta) = \frac{1}{N} \sum_{t=0}^{N-1} \epsilon^2(t; \theta)
  • Taking the derivative to find the minimum:
    • \hat{\theta} = argmin J(\theta) \implies \sum_{t=0}^{N-1} \phi(t) \epsilon(t; \theta) = 0
  • This leads to a linear system of equations.
  • Solution:
    • \hat{\theta}_N = R^{-1}(N) f(N)
    • R(N) = \sum_{t=0}^{N-1} \phi(t) \phi^T(t)
    • f(N) = \sum_{t=0}^{N-1} \phi(t) y(t)
  • If the data is "rich enough", R(N) will be full rank (invertible).
    • R(N) is a p \times p matrix.
    • f(N) is a p \times 1 matrix.

Ordinary Least Squares (OLS) Estimation

  • LS Estimate
    • \hat{\theta}_N = R^{-1}(N) f(N)
  • Consistency: Is OLS a "good" estimate?
  • Assumption: The data-generating process is an ARX model.
    • y(t) = \phi^T(t) \theta_0 + e(t)
    • \theta_0 represents the ground-truth parameters.
  • Ideally, we want \hat{\theta}N = \theta0

OLS and Estimation Error

  • Substituting the Data Generating Process
    • \hat{\theta}{LS} = R^{-1}(N) \sum{t=0}^{N-1} \phi(t) y(t)
    • \hat{\theta}{LS} = R^{-1}(N) \sum{t=0}^{N-1} \phi(t) [\phi^T(t) \theta_0 + e(t)]
    • \hat{\theta}{LS} = \theta0 + R^{-1}(N) \sum_{t=0}^{N-1} \phi(t) e(t)
  • Estimation Error
    • \tilde{\theta}N = \hat{\theta}N - \theta0 = R^{-1}(N) fe(N)
    • fe(N) = \sum{t=0}^{N-1} \phi(t) e(t)

Consistency of OLS

  • Definition of Consistency
    • An estimate \hat{\theta} is consistent if \lim{N \to \infty} \hat{\theta}N = \theta_0
  • Question: Is OLS consistent?
  • Analysis
    • \lim{N \to \infty} R^{-1}(N) fe(N) = 0
    • \lim{N \to \infty} R(N) = \lim{N \to \infty} \frac{1}{N} \sum_{t=0}^{N-1} \phi(t) \phi^T(t)
    • \lim{N \to \infty} \frac{1}{N} \sum{t=0}^{N-1} \begin{bmatrix} -y(t-1) \ … \ -y(t-na) \ u(t) \ … \ u(t-nb) \end{bmatrix} \begin{bmatrix} -y(t-1) & … & -y(t-na) & u(t) & … & u(t-nb) \end{bmatrix}
  • Covariance Matrices
    • Ry(\tau) = \lim{N \to \infty} \frac{1}{N} \sum_{t=0}^{N-1} y(t+\tau) y(t)

Covariance Matrix and Open-Loop Experiment

  • \lim{N \to \infty} \frac{R(N)}{N} = \begin{bmatrix} Ry(0) & Ry(1) & … & Ry(na) & R{yu}(0) & … & R{yu}(nb) \ Ry(1) & Ry(0) & … & Ry(na-1) & R{yu}(1) & … & R{yu}(nb) \ … & … & … & … & … & … & … \ Ry(na) & Ry(na-1) & … & Ry(0) & R{yu}(na) & … & R{yu}(nb) \ R{uy}(0) & R{uy}(1) & … & R{uy}(na) & Ru(0) & … & Ru(nb) \ … & … & … & … & … & … & … \ R{uy}(nb) & R{uy}(nb-1) & … & R{uy}(na) & Ru(nb) & … & Ru(0) \end{bmatrix}
  • \lim{N \to \infty} \frac{fe(N)}{N} = \lim{N \to \infty} \frac{1}{N} \sum{t=0}^{N-1} \begin{bmatrix} -y(t-1) \ … \ -y(t-na) \ u(t) \ … \ u(t-nb) \end{bmatrix} e(t) = \begin{bmatrix} -R{ye}(1) \ … \ -R{ye}(na) \ R{ue}(0) \ … \ R{ue}(nb) \end{bmatrix} = 0
  • Since e(t) is assumed to be white noise.
  • Any open-loop experiment.

Regularization and Prediction Error

  • Regularization
    • J(\theta) = MSE(\theta) + \lambda ||\theta||^2
    • Adds bias but controls variance when N \approx p
  • Prediction
    • \hat{y}(t|t-1) = E[y(t) | \Omega(t-1)]
    • \epsilon(t) = y(t) - \hat{y}(t|t-1)
    • \Omega(t-1) is the information set up to time t-1.
    • \epsilon(t) is the "new" information in y(t) that did not exist in \Omega(t-1).
    • If the model learned well, \epsilon(t) should be uncorrelated with \Omega(t-1).

Prediction Error Framework

  • General Principle
    • Good Model = Good Prediction
    • Minimize prediction error.
    • VN(\theta) = \frac{1}{N} \sum{t=1}^{N} \epsilon^2(t, \theta)
  • For ARX Models
    • Prediction error should be uncorrelated with past data.
    • \sum_{t=1}^{N} \epsilon(t) \phi(t) = 0

Model Mismatch

  • Question: What happens if we fit an ARX model to data coming from a non-ARX system?
  • Scenario: True system is not ARX.
    • y(t) = \phi^T(t) \theta_0 + v(t)
    • v(t) is colored noise.
  • Model: ARX model.
    • y(t; \theta) = \phi^T(t) \theta + e(t)
  • Model Mismatch
    • A0 y(t) = B0 u(t) + C_0 e(t)
    • A y(t) = B u(t) + e(t)
    • We can always re-arrange into an ARX-looking form, except that the noise won't be white.

Consequences of Model Mismatch

  • LS Estimate
    • \hat{\theta}_{LS} = R^{-1}(N) f(N)
    • \hat{\theta}{LS} = \theta0 + R^{-1}(N) f_v(N)
  • \lim{N \to \infty} R^{-1}(N) fv(N) \neq 0
    • Not generally zero.
    • Bias if v is colored.
    • \hat{\theta} won't be consistent if the experiment is open-loop.

Instrumental Variable (IV) Method

  • System and Model
    • Assume System: y(t) = \phi^T(t) \theta_0 + v(t)
    • Model: \hat{y}(t; \theta) = \phi^T(t) \theta + e(t)
  • Instead of minimizing MSE
    • \sum_{t=0}^{N-1} \zeta(t) \epsilon(t; \theta) = 0
    • \zeta(t) is the instrumental variable.
    • \zeta(t) e(t; \theta) = 0
  • For now, assume \zeta(t) is given
  • I.V.

IV Estimation

  • Cost Function
    • J(\theta) = \frac{1}{N} \sum_{t=0}^{N-1} \zeta(t) [y(t) - \phi^T(t) \theta]
  • \hat{\theta}{IV} = \Bigg[\sum{t=0}^{N-1} \zeta(t) \phi(t)^T \Bigg]^{-1} \Bigg[\sum_{t=0}^{N-1} \zeta(t) y(t) \Bigg]
  • Asymptotic Analysis
    • \lim{N \to \infty} \frac{1}{N} \sum{t=0}^{N-1} \zeta(t) v(t) = 0
    • \lim{N \to \infty} \hat{\theta}{IV} = \theta0 + \Bigg[ R{\zeta \phi} \Bigg]^{-1} \Bigg[ R_{\zeta v} \Bigg]

Designing the Instrumental Variable

  • \lim{N \to \infty} \hat{\theta}{IV} = \theta0 + \Bigg[ R{\zeta \phi} \Bigg]^{-1} \Bigg[ R_{\zeta v} \Bigg]
  • We want:
    • R_{\zeta v} = 0
    • \zeta should be uncorrelated with v.
    • R_{\zeta \phi}
    • Should be invertible and even well-conditioned.
    • \zeta should be correlated with \phi.