1/12
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Pros of Polynomials
Can find any (smooth enough) function
Linear model โclosed formโ solution
Well understood numerical problem
Many software packages
Explicit - Very basic, transparent and understandable
Cons of Polynomials
Matrix inversion - cubic in computer resources
For m x m matrix, doubling m, requires 8 times more memory, takes 8 times longer
Most terms irrelevant - unnecessary complexity
Leads to problems for high degree and dimensions
Num of coefficients = (p + d) Choose (d) = (p + d) ! / (p ! d !)
p = number of predictors
d = degree
Gaussian Radial Basis Functions
Radially symmetric - only the distance from the โcentreโ is important
Formula: ๐(๐ฅ) = exp( โ (๐ฅโ๐๐)ยฒ / 2 ๐๐2)
๐๐ = width parameter
๐๐ = centre parameter
Decay with distance from ๐๐
Local Minima
The Empirical Cost Function is RSS / MSE
For linear regression the estimated function guarantees convexity which will have a unique minimum
The estimated function does not always guarantee convexity
Numerical algorithms and Gradient Descent can experience local minima
Local minima can happen also for convex models or linear models when the cost function is non convex
Training, testing and deploying models
A model is at most as good as the data used to create it, it is usually not applicable away from that data range. Training, testing and deploying models should be done on consistent data ranges
Overfitting the training data
Occurs when the model is over trained to the extent that it can not recognise new data instances even though the data is part of the domain.
An over fitted model also learns the noise and random fluctuations in the training dataset
Implies that RSS is zero or very close to zero
More likely with nonparametric and nonlinear models
Underfitting the training data
When the model is too simple to model the domain accurately and hence can not generalize to new data
Poor performance on training data
Hold Out Validation
AKA Train_Test approach
Data is randomly split into training and test sets - typically 70:30 or 80:20
The training dataset is made up of known data which is used to train the model.
The test dataset is made up of data not seen by the machine learning methodology during training. It is used to validate the model
Need to randomise the sample by predictors
Hold Out Advantages
Computationally fast
If our data is huge and our test sample as well as train sample have the same distribution then this approach works
Hold Out Disadvantages
With limited data, some information about the data might be missed during training resulting in high bias.
Not ideal for tuning hyperparameter
K-Fold Cross Validation
First randomise the sample by predictors then:
Divide a set of n observations into K groups of equal size
Train K models using each of the (K-1) groups of data and validate the performance of each model on the single group of data left
Use the average performance of the model validations for the assessment
Leave-one-out Cross Validation
K-Fold Cross Validation taken to the extreme, where K is equal to N - the number of data points in the set
More computationally demanding than K-Folds
Key differences between Hold Out and Cross Validation
Hold-out validation wastes the held out data usually in short supply
Cross-validation provides a way of using all data for both training and testing and gives a more accurate estimate of generalisation performance.