1/24
em el
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Nonlinear
Means that you can’t predict a label with a traditional model
Neural Networks
A family of model architectures designed to find nonlinear patterns in data, the model automatically learns the optimal feature crosses to perform
Hidden Layers
Additional layers between the input and output layer
Neurons
The nodes in the hidden layers
Nonlinear mathematical operations
Configuring a neural network to learn non linear relationships between values
Sigmoid Function
Activation Function
A function that enables neural networks to learn nonlinear (complex) relationships between features and label
Sigmoid
TanH
reLU
Sigmoid Function
A mathematical function that squishes an input value into a constrained range, typically 0 to 1
It converts the raw output of a logistic regression model to a probability
Acting as an activation function in some neural networks
Its term generally refers to any S-shaped functions
TanH function
Its a mathematical function that squisihes an input value to a range of -1 to +1
ReLU
It is an activation function that transforms output using the following algo:
If the input value x is less than 0, return 0
If the input value x is greater than or equal to 0, return the input value
It is less susceptible to the vanishing gradient problem during NN training
Vanishing Gradient Problem
The tendency for the gradients of early hidden layers of some deep NN to become low or flat. Very low gradients result in smaller changes to weights on nodes, leading to little or no learning.
This is when gradient values approach 0 for the lower layers
ReLU activation functions can help prevent this
Backpropagation
This is the algorithm that implements gradient descent in NNs
Keras now implement this for you
This is just like adjusting the weights of each inputs so that you can derive to a better output
Exploding Gradients
If the weights in a network are very large, then the gradients for the lower layers involve products of many large terms.
This is gradients that get too large to converge
Batch normalization can help with this as well as lowering the learning rate
Dead ReLU Units
Once the weighted sum for a ReLU unit falls below 0, the ReLU unit can get stuck. It outputs 0, contributing nothing to the output, and gradients can no longer flow through it during backpropagation.
Lowering the learning rate can keep ReLU units falling into this problem
Dropout Regularization
It is another form of regularization that works by randomly dropping out unit activations in a network for a single gradient step. The more you drop out, the stronger the regularization
Multi-class classification model
The model can pick from multiple possibilities
One vs all
It uses binary classification for a series of yes or no prediction, it might be given a picture of fruit, and the training would be to ask if is it an image of fruit1, is it an image of fruit 2, and so on
Increasingly inefficient as the number of classes rises
One vs one (softmax)
It still uses the same architecture as one vs all, but the difference is that it uses the softmax activation as its transform
Full softmax
The softmax that calculates a rpobability for every class
Candidate sampling
This means that softmax calculates a probability for all the positive labels but only for a random sample of negative labels.
This can improve efficiency in problems having a large number of classes
Embedding
Is a vector representation of data in embedding space.
A model finds potential ___ by projecting the high-dimensional space of inital data vectors into a lower-dimensional space.
Word2vec
Technique for creating vector representations of words
Dimensionality Reduction Techniques
this is a common way to get embeddings, uses a bag-of-words vectors, and finding the most important patterns and keeping those
Static embedding
this embedding gives one vector per word, no matter the context, even is many words are ambiguous
Contextual embeddings
This embedding solves the static embedding problem by generating a different vector depending the sentence the word appears in
Transformers
produce contextual embeddings using attention and position