Decision Trees and SVM

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 54

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

55 Terms

1

Support Vector Machine

Supervised ML method that finds the best hyperplanes to categorize new examples.

New cards
2

Hyperplane

A plane of dimensionality lower than the data, such as a line for 2D data or a point for 1D data.

New cards
3

Linearly separable

Data that can be separated by a straight line.

New cards
4

Non-linearly separable

Data that cannot be separated by a straight line.

New cards
5

Margin

The distance from the hyperplane to the closest points, optimized in SVM.

New cards
6

Support Vectors

Points closest to the hyperplane that SVM considers in classification.

New cards
7

Dot Product

A way to project one vector onto another, used in SVM for classification.

New cards
8

Decision Rule

In SVM, finding (w, b) to maximize the distance d subject to constraints.

New cards
9

Soft Margins

In SVM, allowing some data points to fall on the wrong side of the line.

New cards
10

Hyperparameter

In SVM, 0 if class is correct, distance if class is incorrect.

New cards
11

Kernel Trick

Converting lower dimension space to a higher dimension space using functions in SVM.

New cards
12

Polynomial Kernel

A kernel in SVM with a hyperparameter d.

New cards
13

Radial Basis Function Kernel

A kernel in SVM creating non-linear combinations of features for separation.

New cards
14

Pros of SVMs

Effective in high dimensional spaces, memory-efficient, and works well with clear margins.

New cards
15

Cons of SVMs

Doesn't perform well with large datasets, overlapping classes, or provide direct probability estimates.

New cards
16

Decision Trees

An intuitive algorithm for classification based on splits.

New cards
17

Entropy

Measures the homogeneity of information in decision trees.

New cards
18

Information Gain

In decision trees, the benefit of a split based on an impurity function.

New cards
19

Boosting + Bagging

Methods to overcome weak learner limitations by combining models.

New cards
20

Boosting

Choosing the next learner based on the errors of the last learner, such as in gradient boosted decision trees.

New cards
21

Bagging

Stochastically choosing the next learners, as seen in random forests.

New cards
22

Boosted Decision Trees

A method that uses a learning rate to train new models based on errors of previous models.

New cards
23

Learning rate

A parameter that controls how much the weights of the model are adjusted with respect to the loss gradient.

New cards
24

New model

A model trained in boosting that learns from the errors of the previous model.

New cards
25

Previous model

The model that precedes the current model in boosting.

New cards
26

Sub model of errors

A model that focuses on the errors made by the previous model in boosting.

New cards
27

Loss

The measure of how well a model predicts the expected outcome.

New cards
28

Number of trees

The quantity of decision trees in an ensemble model.

New cards
29

Decision trees

Models that work by splitting data to maximize information gain.

New cards
30

Support Vector Machines (SVMs)

Powerful classical machine learning approaches for supervised learning that find an optimal hyperplane to separate data.

New cards
31

Neural Networks

Models that produce a single output from a matrix of inputs, weights, and biases.

New cards
32

Deep Neural Networks (DNN)

Neural networks with multiple layers that can learn complex patterns in data.

New cards
33

Activation functions

Functions that determine the output of a neural network.

New cards
34

Backpropagation

A method for adjusting the weights of a neural network based on the gradient of the loss function.

New cards
35

Optimisation

The process of adjusting the model to reduce errors and improve performance.

New cards
36

Regularisation

Techniques used to prevent overfitting in machine learning models.

New cards
37

Hidden layers

Layers in a neural network between the input and output layers where the complex patterns are learned.

New cards
38

Dense layers

Also known as fully connected layers, where each neuron is connected to every neuron in the previous layer.

New cards
39

Sequential (Neural Network Programming)

A way to program a neural network in TensorFlow/Keras that is quick and easy.

New cards
40

Functional (Neural Network Programming)

A more complex but flexible way to program a neural network in TensorFlow/Keras.

New cards
41

Vanishing gradient problem

A problem in neural networks where gradients become extremely small, hindering learning.

New cards
42

Back Propagation

The process of calculating the gradient of the loss function with respect to the neural network's parameters to update them efficiently.

New cards
43

Loss Function

A function that measures how well a neural network model predicts the expected outcome.

New cards
44

Stochastic Gradient Descent (SGD)

An optimization algorithm that calculates the gradient of the loss function per sample rather than on the entire batch.

New cards
45

Mini-batch SGD

An optimization technique that calculates the loss gradient on batches of a set size, combining benefits of both gradient descent methods.

New cards
46

Momentum

An optimization technique that reduces oscillations in the gradient descent process, aiding convergence.

New cards
47

Nesterov Accelerated Gradient

An optimization method that corrects the momentum direction to prevent overshooting the minimum during parameter updates.

New cards
48

Adaptive Methods

Optimization techniques that adjust learning rates for different parameters based on their update frequencies.

New cards
49

Adagrad

An adaptive optimization algorithm that modifies the learning rate at each time step based on past gradients computed for each parameter.

New cards
50

AdaDelta

An optimization method that stores gradients from a limited number of previous steps to prevent the continuous decay of the learning rate.

New cards
51

Adam

An optimization algorithm that combines features of AdaDelta and momentum to achieve efficient parameter updates in neural networks.

New cards
52

Dropout

A regularization technique that randomly sets a fraction of input units to zero during training to prevent overfitting.

New cards
53

Adding to Memory

Combining and storing new information

New cards
54

Forgetting Operation

Removing outdated information from memory

New cards
55

Updating Memory

Performing forgetting and adding operations on the memory

New cards
robot