Decision Trees and SVM

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/54

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

55 Terms

1
New cards

Support Vector Machine

Supervised ML method that finds the best hyperplanes to categorize new examples.

2
New cards

Hyperplane

A plane of dimensionality lower than the data, such as a line for 2D data or a point for 1D data.

3
New cards

Linearly separable

Data that can be separated by a straight line.

4
New cards

Non-linearly separable

Data that cannot be separated by a straight line.

5
New cards

Margin

The distance from the hyperplane to the closest points, optimized in SVM.

6
New cards

Support Vectors

Points closest to the hyperplane that SVM considers in classification.

7
New cards

Dot Product

A way to project one vector onto another, used in SVM for classification.

8
New cards

Decision Rule

In SVM, finding (w, b) to maximize the distance d subject to constraints.

9
New cards

Soft Margins

In SVM, allowing some data points to fall on the wrong side of the line.

10
New cards

Hyperparameter

In SVM, 0 if class is correct, distance if class is incorrect.

11
New cards

Kernel Trick

Converting lower dimension space to a higher dimension space using functions in SVM.

12
New cards

Polynomial Kernel

A kernel in SVM with a hyperparameter d.

13
New cards

Radial Basis Function Kernel

A kernel in SVM creating non-linear combinations of features for separation.

14
New cards

Pros of SVMs

Effective in high dimensional spaces, memory-efficient, and works well with clear margins.

15
New cards

Cons of SVMs

Doesn't perform well with large datasets, overlapping classes, or provide direct probability estimates.

16
New cards

Decision Trees

An intuitive algorithm for classification based on splits.

17
New cards

Entropy

Measures the homogeneity of information in decision trees.

18
New cards

Information Gain

In decision trees, the benefit of a split based on an impurity function.

19
New cards

Boosting + Bagging

Methods to overcome weak learner limitations by combining models.

20
New cards

Boosting

Choosing the next learner based on the errors of the last learner, such as in gradient boosted decision trees.

21
New cards

Bagging

Stochastically choosing the next learners, as seen in random forests.

22
New cards

Boosted Decision Trees

A method that uses a learning rate to train new models based on errors of previous models.

23
New cards

Learning rate

A parameter that controls how much the weights of the model are adjusted with respect to the loss gradient.

24
New cards

New model

A model trained in boosting that learns from the errors of the previous model.

25
New cards

Previous model

The model that precedes the current model in boosting.

26
New cards

Sub model of errors

A model that focuses on the errors made by the previous model in boosting.

27
New cards

Loss

The measure of how well a model predicts the expected outcome.

28
New cards

Number of trees

The quantity of decision trees in an ensemble model.

29
New cards

Decision trees

Models that work by splitting data to maximize information gain.

30
New cards

Support Vector Machines (SVMs)

Powerful classical machine learning approaches for supervised learning that find an optimal hyperplane to separate data.

31
New cards

Neural Networks

Models that produce a single output from a matrix of inputs, weights, and biases.

32
New cards

Deep Neural Networks (DNN)

Neural networks with multiple layers that can learn complex patterns in data.

33
New cards

Activation functions

Functions that determine the output of a neural network.

34
New cards

Backpropagation

A method for adjusting the weights of a neural network based on the gradient of the loss function.

35
New cards

Optimisation

The process of adjusting the model to reduce errors and improve performance.

36
New cards

Regularisation

Techniques used to prevent overfitting in machine learning models.

37
New cards

Hidden layers

Layers in a neural network between the input and output layers where the complex patterns are learned.

38
New cards

Dense layers

Also known as fully connected layers, where each neuron is connected to every neuron in the previous layer.

39
New cards

Sequential (Neural Network Programming)

A way to program a neural network in TensorFlow/Keras that is quick and easy.

40
New cards

Functional (Neural Network Programming)

A more complex but flexible way to program a neural network in TensorFlow/Keras.

41
New cards

Vanishing gradient problem

A problem in neural networks where gradients become extremely small, hindering learning.

42
New cards

Back Propagation

The process of calculating the gradient of the loss function with respect to the neural network's parameters to update them efficiently.

43
New cards

Loss Function

A function that measures how well a neural network model predicts the expected outcome.

44
New cards

Stochastic Gradient Descent (SGD)

An optimization algorithm that calculates the gradient of the loss function per sample rather than on the entire batch.

45
New cards

Mini-batch SGD

An optimization technique that calculates the loss gradient on batches of a set size, combining benefits of both gradient descent methods.

46
New cards

Momentum

An optimization technique that reduces oscillations in the gradient descent process, aiding convergence.

47
New cards

Nesterov Accelerated Gradient

An optimization method that corrects the momentum direction to prevent overshooting the minimum during parameter updates.

48
New cards

Adaptive Methods

Optimization techniques that adjust learning rates for different parameters based on their update frequencies.

49
New cards

Adagrad

An adaptive optimization algorithm that modifies the learning rate at each time step based on past gradients computed for each parameter.

50
New cards

AdaDelta

An optimization method that stores gradients from a limited number of previous steps to prevent the continuous decay of the learning rate.

51
New cards

Adam

An optimization algorithm that combines features of AdaDelta and momentum to achieve efficient parameter updates in neural networks.

52
New cards

Dropout

A regularization technique that randomly sets a fraction of input units to zero during training to prevent overfitting.

53
New cards

Adding to Memory

Combining and storing new information

54
New cards

Forgetting Operation

Removing outdated information from memory

55
New cards

Updating Memory

Performing forgetting and adding operations on the memory