1/26
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Ensemble Learning
a group of predictors and combine them into a voting classifier
Voting classifier
hard voting and soft voting
Hard voting
use the predicted class from each model and return the majority vote
Soft voting
use the estimated probability produced by each classifier and combine probabilities and return the class with the highest probability
Bagging
bootstrap aggregating-create random sets of training data using the original training dataset
Bootstrap
resampling with replacement-randomly select a sample and return them to the original set. Subsequent samples may contain samples that have been picked before
Pasting
creating training sets with random sampling without replacement. There are no duplicates of the sample
Bias
error due to wrong assumptions in the model-high bias leads to underfitting
variance
error due to sensitivity to small fluctuations in the training set-high variance to overfitting
Out-of-bag evaluation
there will be replication of samples in bagging. 37% of the original samples won’t be included in the training of each model
Random patches and subspaces
bagging classifier can also work on random sub-sets of the features in the dataset. Can be useful when you have a large feature set in your data
Random patches
random samples and random features
Random subspace
keep all samples and select randm features
Random Forest
based on decision trees. Brings all the techniques into a single classifier. Uses random patches: bagging & random features
Random Forest Method
has all hyperparamets of decision trees. On each node a random subset of features is used to decide on the best split. With random feature sub-sets it introduces randomness across the generated trees. Greater tree diversity > trade higher bias for lower variance
Extra-Trees
Extremely Randomised Trees. Random feature subset + random threshold on each node. Select the combination with the best performance. Helps with reduced training time
Feature Importance
useful if you need to understand which features contribute the most of the model’s behaviour
Random Forests for Feature Importance
each node uses a feature to reduce impurity. Estimating the weighted average of how each feature contributes across all trained decision trees leads to estimated importance
Boosting
combine multiple weak learners into a strong learner. Train predictors sequentially: use previous predictors to improve the next
AdaBoost
adaptive boosting. Subsequent weak learners are adapted in favour of the samples that were misclassified by previous classifiers. The overall output is the weighted sum of all predictors. After training a weak classifier
Gradient Boosting
sequential training of predictors. Each new predictor is trained on the residual errors of the previous predictor
XGBoost
based on GBDT. a scale and highly accurate implementation of gradient-boosting. Regularisation. Handling sparse data. Weighted quantile sketch. Parallel learning
Regularisation
penalise complex models and prevent overfitting
Handling sparse data
missing values
Weighted quantile sketch
works with weighted data when splitting
Parallel learning
utilise multi-core CPUs/GPU to improve performance
Stacking
training another stronger/high-level predictor based on the outputs(not errors) of weaker/lower-level predictors