1/11
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Decision trees
widely used models for classification and regression. Internally they’re a flow-like structure in each node represents a tests on an attribute. When trained they learn a hierarchy of if/else questions that lead to a decision
Training
Structure
graphviz package. Gini impurity
Gini impurity
measures the nodes purity
CART Training algorithm
Classification and Regression Tree
CART Objective
select feature k and threshold t to form the rule: if (k<t) then left else right. Search for all pairs (kt_k) and select the one that minimised the cost function
Entropy
derived from thermodynamics as a measure of disorder. A measure of information in a system calculated based on the probabilities of different situations within the systems
Gini v Entropy
faster to compute. tends to isolate the most frequent class in its own branch
Entropy v Gini
tends to produce more balanced trees
Hyperparameter optimisation
automate the process of testing different combinations of our model’s parameters. Grid Search
Grid search
greedily search all combinations in parameters we want to explore for each combination it will train and test with cross-validation to estimate the performance of the model
Randomised Search
an alternative model that tests random combinations of the hyperparameters it selects random values for each parameter to perform tests then specifies the no. iterations and it will attempt random combinations for each iteration which is appropriate if you have multiple parameters with a wide range of possible values