Model Validation and Prediction Errors
Model Validation
R² (Coefficient of Determination)
- R² represents the percentage of variance in the dependent variable (y) that can be predicted from the independent variable(s) (x).
- It assesses how well a model explains the variability of the data.
- Formula: R2=1−∑(yi−yˉ)2∑(y<em>i−y^</em>i)2
- Where:
- yi: Actual values.
- y^i: Predicted values.
- yˉ: Mean of the actual values.
Conditions Affecting R²
- Linear Regression (ARX):
- Under linear regression, particularly Autoregressive with Exogenous inputs (ARX) models, specific conditions can skew the R² value.
- No Cross-Validation (C.V.): Training data equals test data.
Pearson Correlation Coefficient
- R² is the square of the Pearson Correlation Coefficient.
- R2=(Pearson Correlation Coefficient)2
- It indicates the proportion of variance in y(t) explained by the model.
Correlation Coefficient
Definition
- For two random variables, X and Y, the correlation coefficient (r) measures the strength and direction of a linear relationship between them.
- Formula:
- r=Var(X)⋅Var(Y)Cov(X,Y)
- Where:
- Cov(X,Y)=N−1∑(x<em>i−xˉ)(y</em>i−yˉ)
- Var(X)=N−1∑(xi−xˉ)2
- Var(Y)=N−1∑(yi−yˉ)2
- The correlation coefficient r lies between -1 and 1 (inclusive): r∈[−1,1].
Application
- For a time series y(t), the prediction y^(t+1∣t) is made based on past values.
- Decomposition of Variance:
- Var(y)=Var(y^)+Var(ϵ)
- Where ϵ represents the error term (unexplained variance), and it's uncorrelated with the predicted values.
- % Variance explained by model = R²
Smoothness and Over-sampling
Over-sampled Data
- Over-sampled data can lead to an inflated R² value, even for poorly performing models.
Prediction
- Prediction of y(t) based on past values:
- y^(t+1∣t)=E[y(t)∣y(t−1),y(t−2),…,u(t),u(t−1),…]
- Even for a random walk model, y^(t+H∣t)=y(t), the model can appear better than it is due to smoothness.
- Random Walk Example: y(t)=y(t−1)+e(t)
- Model Comparison: Compare the model with a random walk model.
- k-step Ahead Prediction: Evaluate the model's performance using k-step ahead predictions.
- Model Differencing: Apply differencing to the time series: Δy(t)=y(t)−y(t−k)
Prediction Errors
Assumptions
- Assume the true system is represented as:
- y(t)=G<em>0(q)u(t)+H</em>0(q)e(t)
- Where:
- G0(q): Transfer function of the true system.
- H0(q): Noise model of the true system.
- u(t): Input.
- e(t): White noise.
- If the estimated model perfectly matches the true system:
- G(q;θ)=G0(q)
- H(q;θ)=H0(q)
Error Analysis
- Error: ϵ=y−y^
- If G(q;θ)=G<em>0(q) and H(q;θ)=H</em>0(q), then: ϵ=H0e
- If H(q;θ)=1, then the error is white noise if it is small. Not necessarily the case.
Checking for Whiteness
Methods
- Direct Inspection: Examine the error sequence ϵ(t) directly.
- MATLAB Demo
- Autocorrelation Function (ACF): Compute and analyze the ACF of the error sequence.
- R<em>ϵ(τ)=N</em>test∑ϵ(t)ϵ(t−τ)
- If ϵ is white noise, the ACF should be close to zero for all non-zero lags.
- Statistical Hypothesis Testing
Statistical Test
- Under the null hypothesis that ϵ is white noise, the following statistic can be used:
- Rϵ(0)N<em>testR</em>ϵ(τ)∼N(0,1)
- This test assumes that the errors are independent and identically distributed (i.i.d.).
Central Limit Theorem (CLT)
- Given i.i.d. random variables X<em>1,X</em>2,…,Xn:
- If E[X<em>i]=μ and Var(X</em>i)=σ2
- Then, σn∑<em>i=1nX</em>i−nμ→N(0,1)
Chi-Squared Test
Test Statistic
- Test statistic for whiteness:
- ∑<em>τ=1T</em>maxR<em>ϵ2(0)N<em>testR</em>ϵ2(τ)∼χ2(T</em>max)
- This statistic follows a chi-squared distribution with Tmax degrees of freedom.
- If the errors are white, the test statistic will be small.