Parametric Modeling and System Identification

Parametric Modeling

  • ARX Model Structure
    • System identification with noise.
    • Data: (u(t),y(t)){(u(t), y(t))}
    • ARX model:
      • A(q1)y(t)=B(q1)u(t)+e(t)A(q^{-1})y(t) = B(q^{-1})u(t) + e(t)
      • [1+a<em>1q1++a</em>naqna]y(t)=[b<em>0+b</em>1q1++bnbqnb]u(t)+e(t)[1 + a<em>1q^{-1} + … + a</em>{na}q^{-na}]y(t) = [b<em>0 + b</em>1q^{-1} + … + b_{nb}q^{-nb}]u(t) + e(t)
      • q1{q^{-1}} is the delay operator.
      • y(t)y(t) is the output at time tt.
      • u(t)u(t) is the input at time tt.
      • e(t)e(t) is the noise at time tt.
      • aia_i are the coefficients of the polynomial AA.
      • bib_i are the coefficients of the polynomial BB.
    • Regression form:
      • y(t)=ϕT(t)θ+e(t)y(t) = \phi^T(t) \theta + e(t)
      • ϕ(t)\phi(t) is the regressor vector (past inputs and outputs).
      • θ\theta is the parameter vector.
    • Regressors: ϕ(t)\phi(t)
    • Parameters: θ\theta
    • A,B    θA, B \implies \theta

ARX Representation

  • Equivalent Form
    • y(t;θ)=ϕT(t)θy(t; \theta) = \phi^T(t) \theta
    • θ=[a<em>1  a</em>na b<em>0  b</em>nb]\theta = \begin{bmatrix} a<em>1 \ … \ a</em>{na} \ b<em>0 \ … \ b</em>{nb} \end{bmatrix}
    • ϕ(t)=[y(t1)  y(tna) u(t)  u(tnb)]\phi(t) = \begin{bmatrix} -y(t-1) \ … \ -y(t-na) \ u(t) \ … \ u(t-nb) \end{bmatrix}
  • Multiple Linear Regression
    • J(θ)=1Nt=0N1ϵ2(t;θ)J(\theta) = \frac{1}{N} \sum_{t=0}^{N-1} \epsilon^2(t; \theta)
    • One-step ahead prediction error: ϵ(t;θ)=y(t)y(t;θ)\epsilon(t; \theta) = y(t) - y(t; \theta)
    • Prediction error.
    • Mean-squared error (MSE) cost function.

Cost Function and Linear System

  • Expressing the cost function:
    • J(θ)=1Nt=0N1ϵ2(t;θ)J(\theta) = \frac{1}{N} \sum_{t=0}^{N-1} \epsilon^2(t; \theta)
  • Taking the derivative to find the minimum:
    • θ^=argminJ(θ)    t=0N1ϕ(t)ϵ(t;θ)=0\hat{\theta} = argmin J(\theta) \implies \sum_{t=0}^{N-1} \phi(t) \epsilon(t; \theta) = 0
  • This leads to a linear system of equations.
  • Solution:
    • θ^N=R1(N)f(N)\hat{\theta}_N = R^{-1}(N) f(N)
    • R(N)=t=0N1ϕ(t)ϕT(t)R(N) = \sum_{t=0}^{N-1} \phi(t) \phi^T(t)
    • f(N)=t=0N1ϕ(t)y(t)f(N) = \sum_{t=0}^{N-1} \phi(t) y(t)
  • If the data is "rich enough", R(N)R(N) will be full rank (invertible).
    • R(N)R(N) is a p×pp \times p matrix.
    • f(N)f(N) is a p×1p \times 1 matrix.

Ordinary Least Squares (OLS) Estimation

  • LS Estimate
    • θ^N=R1(N)f(N)\hat{\theta}_N = R^{-1}(N) f(N)
  • Consistency: Is OLS a "good" estimate?
  • Assumption: The data-generating process is an ARX model.
    • y(t)=ϕT(t)θ0+e(t)y(t) = \phi^T(t) \theta_0 + e(t)
    • θ0\theta_0 represents the ground-truth parameters.
  • Ideally, we want θ^<em>N=θ</em>0\hat{\theta}<em>N = \theta</em>0

OLS and Estimation Error

  • Substituting the Data Generating Process
    • θ^<em>LS=R1(N)</em>t=0N1ϕ(t)y(t)\hat{\theta}<em>{LS} = R^{-1}(N) \sum</em>{t=0}^{N-1} \phi(t) y(t)
    • θ^<em>LS=R1(N)</em>t=0N1ϕ(t)[ϕT(t)θ0+e(t)]\hat{\theta}<em>{LS} = R^{-1}(N) \sum</em>{t=0}^{N-1} \phi(t) [\phi^T(t) \theta_0 + e(t)]
    • θ^<em>LS=θ</em>0+R1(N)t=0N1ϕ(t)e(t)\hat{\theta}<em>{LS} = \theta</em>0 + R^{-1}(N) \sum_{t=0}^{N-1} \phi(t) e(t)
  • Estimation Error
    • θ~<em>N=θ^</em>Nθ<em>0=R1(N)f</em>e(N)\tilde{\theta}<em>N = \hat{\theta}</em>N - \theta<em>0 = R^{-1}(N) f</em>e(N)
    • f<em>e(N)=</em>t=0N1ϕ(t)e(t)f<em>e(N) = \sum</em>{t=0}^{N-1} \phi(t) e(t)

Consistency of OLS

  • Definition of Consistency
    • An estimate θ^\hat{\theta} is consistent if lim<em>Nθ^</em>N=θ0\lim<em>{N \to \infty} \hat{\theta}</em>N = \theta_0
  • Question: Is OLS consistent?
  • Analysis
    • lim<em>NR1(N)f</em>e(N)=0\lim<em>{N \to \infty} R^{-1}(N) f</em>e(N) = 0
    • lim<em>NR(N)=lim</em>N1Nt=0N1ϕ(t)ϕT(t)\lim<em>{N \to \infty} R(N) = \lim</em>{N \to \infty} \frac{1}{N} \sum_{t=0}^{N-1} \phi(t) \phi^T(t)
    • lim<em>N1N</em>t=0N1[y(t1)  y(tna) u(t)  u(tnb)][y(t1)amp;amp;y(tna)amp;u(t)amp;amp;u(tnb)]\lim<em>{N \to \infty} \frac{1}{N} \sum</em>{t=0}^{N-1} \begin{bmatrix} -y(t-1) \ … \ -y(t-na) \ u(t) \ … \ u(t-nb) \end{bmatrix} \begin{bmatrix} -y(t-1) &amp; … &amp; -y(t-na) &amp; u(t) &amp; … &amp; u(t-nb) \end{bmatrix}
  • Covariance Matrices
    • R<em>y(τ)=lim</em>N1Nt=0N1y(t+τ)y(t)R<em>y(\tau) = \lim</em>{N \to \infty} \frac{1}{N} \sum_{t=0}^{N-1} y(t+\tau) y(t)

Covariance Matrix and Open-Loop Experiment

  • lim<em>NR(N)N=[R</em>y(0)amp;R<em>y(1)R</em>y(na)amp;R<em>yu(0)R</em>yu(nb) R<em>y(1)R</em>y(0)amp;amp;R<em>y(na1)R</em>yu(1)amp;amp;R<em>yu(nb)  R</em>y(na)amp;R<em>y(na1)R</em>y(0)amp;R<em>yu(na)R</em>yu(nb) R<em>uy(0)R</em>uy(1)amp;amp;R<em>uy(na)R</em>u(0)amp;amp;R<em>u(nb)  R</em>uy(nb)amp;R<em>uy(nb1)R</em>uy(na)amp;R<em>u(nb)R</em>u(0)]\lim<em>{N \to \infty} \frac{R(N)}{N} = \begin{bmatrix} R</em>y(0) &amp; R<em>y(1) & … & R</em>y(na) &amp; R<em>{yu}(0) & … & R</em>{yu}(nb) \ R<em>y(1) & R</em>y(0) &amp; … &amp; R<em>y(na-1) & R</em>{yu}(1) &amp; … &amp; R<em>{yu}(nb) \ … & … & … & … & … & … & … \ R</em>y(na) &amp; R<em>y(na-1) & … & R</em>y(0) &amp; R<em>{yu}(na) & … & R</em>{yu}(nb) \ R<em>{uy}(0) & R</em>{uy}(1) &amp; … &amp; R<em>{uy}(na) & R</em>u(0) &amp; … &amp; R<em>u(nb) \ … & … & … & … & … & … & … \ R</em>{uy}(nb) &amp; R<em>{uy}(nb-1) & … & R</em>{uy}(na) &amp; R<em>u(nb) & … & R</em>u(0) \end{bmatrix}
  • lim<em>Nf</em>e(N)N=lim<em>N1N</em>t=0N1[y(t1)  y(tna) u(t)  u(tnb)]e(t)=[R<em>ye(1)  R</em>ye(na) R<em>ue(0)  R</em>ue(nb)]=0\lim<em>{N \to \infty} \frac{f</em>e(N)}{N} = \lim<em>{N \to \infty} \frac{1}{N} \sum</em>{t=0}^{N-1} \begin{bmatrix} -y(t-1) \ … \ -y(t-na) \ u(t) \ … \ u(t-nb) \end{bmatrix} e(t) = \begin{bmatrix} -R<em>{ye}(1) \ … \ -R</em>{ye}(na) \ R<em>{ue}(0) \ … \ R</em>{ue}(nb) \end{bmatrix} = 0
  • Since e(t)e(t) is assumed to be white noise.
  • Any open-loop experiment.

Regularization and Prediction Error

  • Regularization
    • J(θ)=MSE(θ)+λθ2J(\theta) = MSE(\theta) + \lambda ||\theta||^2
    • Adds bias but controls variance when NpN \approx p
  • Prediction
    • y^(tt1)=E[y(t)Ω(t1)]\hat{y}(t|t-1) = E[y(t) | \Omega(t-1)]
    • ϵ(t)=y(t)y^(tt1)\epsilon(t) = y(t) - \hat{y}(t|t-1)
    • Ω(t1)\Omega(t-1) is the information set up to time t1t-1.
    • ϵ(t)\epsilon(t) is the "new" information in y(t)y(t) that did not exist in Ω(t1)\Omega(t-1).
    • If the model learned well, ϵ(t)\epsilon(t) should be uncorrelated with Ω(t1)\Omega(t-1).

Prediction Error Framework

  • General Principle
    • Good Model = Good Prediction
    • Minimize prediction error.
    • V<em>N(θ)=1N</em>t=1Nϵ2(t,θ)V<em>N(\theta) = \frac{1}{N} \sum</em>{t=1}^{N} \epsilon^2(t, \theta)
  • For ARX Models
    • Prediction error should be uncorrelated with past data.
    • t=1Nϵ(t)ϕ(t)=0\sum_{t=1}^{N} \epsilon(t) \phi(t) = 0

Model Mismatch

  • Question: What happens if we fit an ARX model to data coming from a non-ARX system?
  • Scenario: True system is not ARX.
    • y(t)=ϕT(t)θ0+v(t)y(t) = \phi^T(t) \theta_0 + v(t)
    • v(t)v(t) is colored noise.
  • Model: ARX model.
    • y(t;θ)=ϕT(t)θ+e(t)y(t; \theta) = \phi^T(t) \theta + e(t)
  • Model Mismatch
    • A<em>0y(t)=B</em>0u(t)+C0e(t)A<em>0 y(t) = B</em>0 u(t) + C_0 e(t)
    • Ay(t)=Bu(t)+e(t)A y(t) = B u(t) + e(t)
    • We can always re-arrange into an ARX-looking form, except that the noise won't be white.

Consequences of Model Mismatch

  • LS Estimate
    • θ^LS=R1(N)f(N)\hat{\theta}_{LS} = R^{-1}(N) f(N)
    • θ^<em>LS=θ</em>0+R1(N)fv(N)\hat{\theta}<em>{LS} = \theta</em>0 + R^{-1}(N) f_v(N)
  • lim<em>NR1(N)f</em>v(N)0\lim<em>{N \to \infty} R^{-1}(N) f</em>v(N) \neq 0
    • Not generally zero.
    • Bias if vv is colored.
    • θ^\hat{\theta} won't be consistent if the experiment is open-loop.

Instrumental Variable (IV) Method

  • System and Model
    • Assume System: y(t)=ϕT(t)θ0+v(t)y(t) = \phi^T(t) \theta_0 + v(t)
    • Model: y^(t;θ)=ϕT(t)θ+e(t)\hat{y}(t; \theta) = \phi^T(t) \theta + e(t)
  • Instead of minimizing MSE
    • t=0N1ζ(t)ϵ(t;θ)=0\sum_{t=0}^{N-1} \zeta(t) \epsilon(t; \theta) = 0
    • ζ(t)\zeta(t) is the instrumental variable.
    • ζ(t)e(t;θ)=0\zeta(t) e(t; \theta) = 0
  • For now, assume ζ(t)\zeta(t) is given
  • I.V.

IV Estimation

  • Cost Function
    • J(θ)=1Nt=0N1ζ(t)[y(t)ϕT(t)θ]J(\theta) = \frac{1}{N} \sum_{t=0}^{N-1} \zeta(t) [y(t) - \phi^T(t) \theta]
  • θ^<em>IV=[</em>t=0N1ζ(t)ϕ(t)T]1[t=0N1ζ(t)y(t)]\hat{\theta}<em>{IV} = \Bigg[\sum</em>{t=0}^{N-1} \zeta(t) \phi(t)^T \Bigg]^{-1} \Bigg[\sum_{t=0}^{N-1} \zeta(t) y(t) \Bigg]
  • Asymptotic Analysis
    • lim<em>N1N</em>t=0N1ζ(t)v(t)=0\lim<em>{N \to \infty} \frac{1}{N} \sum</em>{t=0}^{N-1} \zeta(t) v(t) = 0
    • lim<em>Nθ^</em>IV=θ<em>0+[R</em>ζϕ]1[Rζv]\lim<em>{N \to \infty} \hat{\theta}</em>{IV} = \theta<em>0 + \Bigg[ R</em>{\zeta \phi} \Bigg]^{-1} \Bigg[ R_{\zeta v} \Bigg]

Designing the Instrumental Variable

  • lim<em>Nθ^</em>IV=θ<em>0+[R</em>ζϕ]1[Rζv]\lim<em>{N \to \infty} \hat{\theta}</em>{IV} = \theta<em>0 + \Bigg[ R</em>{\zeta \phi} \Bigg]^{-1} \Bigg[ R_{\zeta v} \Bigg]
  • We want:
    • Rζv=0R_{\zeta v} = 0
    • ζ\zeta should be uncorrelated with vv.
    • RζϕR_{\zeta \phi}
    • Should be invertible and even well-conditioned.
    • ζ\zeta should be correlated with ϕ\phi.