ML Finals

0.0(0)
studied byStudied by 1 person
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/24

flashcard set

Earn XP

Description and Tags

em el

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

25 Terms

1
New cards

Nonlinear

Means that you can’t predict a label with a traditional model

2
New cards

Neural Networks

A family of model architectures designed to find nonlinear patterns in data, the model automatically learns the optimal feature crosses to perform

3
New cards

Hidden Layers

Additional layers between the input and output layer

4
New cards

Neurons

The nodes in the hidden layers

5
New cards

Nonlinear mathematical operations

Configuring a neural network to learn non linear relationships between values

  • Sigmoid Function

6
New cards

Activation Function

A function that enables neural networks to learn nonlinear (complex) relationships between features and label

  • Sigmoid

  • TanH

  • reLU

7
New cards

Sigmoid Function

A mathematical function that squishes an input value into a constrained range, typically 0 to 1

  • It converts the raw output of a logistic regression model to a probability

  • Acting as an activation function in some neural networks

  • Its term generally refers to any S-shaped functions

8
New cards

TanH function

Its a mathematical function that squisihes an input value to a range of -1 to +1

9
New cards

ReLU

It is an activation function that transforms output using the following algo:

  • If the input value x is less than 0, return 0

  • If the input value x is greater than or equal to 0, return the input value

It is less susceptible to the vanishing gradient problem during NN training

10
New cards

Vanishing Gradient Problem

The tendency for the gradients of early hidden layers of some deep NN to become low or flat. Very low gradients result in smaller changes to weights on nodes, leading to little or no learning.

This is when gradient values approach 0 for the lower layers

ReLU activation functions can help prevent this

11
New cards

Backpropagation

This is the algorithm that implements gradient descent in NNs

Keras now implement this for you

This is just like adjusting the weights of each inputs so that you can derive to a better output

12
New cards

Exploding Gradients

If the weights in a network are very large, then the gradients for the lower layers involve products of many large terms.

This is gradients that get too large to converge

Batch normalization can help with this as well as lowering the learning rate

13
New cards

Dead ReLU Units

Once the weighted sum for a ReLU unit falls below 0, the ReLU unit can get stuck. It outputs 0, contributing nothing to the output, and gradients can no longer flow through it during backpropagation.
Lowering the learning rate can keep ReLU units falling into this problem

14
New cards

Dropout Regularization

It is another form of regularization that works by randomly dropping out unit activations in a network for a single gradient step. The more you drop out, the stronger the regularization

15
New cards

Multi-class classification model

The model can pick from multiple possibilities

16
New cards

One vs all

It uses binary classification for a series of yes or no prediction, it might be given a picture of fruit, and the training would be to ask if is it an image of fruit1, is it an image of fruit 2, and so on

  • Increasingly inefficient as the number of classes rises

17
New cards

One vs one (softmax)

It still uses the same architecture as one vs all, but the difference is that it uses the softmax activation as its transform

18
New cards

Full softmax

The softmax that calculates a rpobability for every class

19
New cards

Candidate sampling

This means that softmax calculates a probability for all the positive labels but only for a random sample of negative labels.

  • This can improve efficiency in problems having a large number of classes

20
New cards

Embedding

Is a vector representation of data in embedding space.

A model finds potential ___ by projecting the high-dimensional space of inital data vectors into a lower-dimensional space.

21
New cards

Word2vec

Technique for creating vector representations of words

22
New cards

Dimensionality Reduction Techniques

this is a common way to get embeddings, uses a bag-of-words vectors, and finding the most important patterns and keeping those

23
New cards

Static embedding

this embedding gives one vector per word, no matter the context, even is many words are ambiguous

24
New cards

Contextual embeddings

This embedding solves the static embedding problem by generating a different vector depending the sentence the word appears in

25
New cards

Transformers

produce contextual embeddings using attention and position