Looks like no one added any tags here yet for you.
machine learning is the study of _________ that improve their __________ at some ________ with ___________
algorithms; performance; task; experience
well-defined learning task: < ____ >
P, T, E
machine learning is good at recognizing ________, recognizing _________, and _________
patterns; anomalies; prediction
deep learning is a type of __________ ________ _________
artificial neural network
more than 2 hidden layers makes it a _____ _______ _________ (____)
deep neural network (DNN)
the most popular machine learning algorithm
deep learning
the ______ in Python is important because it indicates a block of code
indentation
comments in python use ___; block comments use three ___ or ___
#; ‘; “
variables in python are _____ _________ and must start with a ______ or the ________ character, no _________
case sensitive; letter; underscore; numbers
boolean in Python is declared as _______
bool
Python for loop syntax through myList
for x in myList:
Python for loop syntax for range of 10
for x in range(10):
used in Python to store data values in key:value pairs; they are _______ and do not allow ________
dictionaries; ordered; duplicates
Python function definition
def myFunction(input):
what do you add to you parameter for an arbitrary number of arguments?
*
declare arr [1 2 3 4] as an numpy array
arr = np.array([1, 2, 3, 4])
declare 2×2 matrix (mat) as numpy array
mat = np.array([1,1],[2,2])
check the dimension of numpy array (arr) TWO WAYS
arr.ndim; arr.shape
comprehensive library for creating static, animated, and interactive visualizations in Python
matplotlib
a vector is a ____ _____
1D array
matrix transpose is an operator that _____ the matrix over its _______, in turn switching the _____ and ______
flips; diagonal; rows; columns
v = [a,b]
f(v) = a² + b²
what is f’(v) with respect to v?
f’(v) = [2a, 2b]
for derivatives with a matrix or vector, we normally multiply the ________ and the ____ ________
transpose; one vector
python code to find magnitude of vector x
y = x**2
s = np.sum(y)
d = np.sqrt(s)
python add/subtract vectors x and y
x + y; x - y
numpy dot product for x and y
np.dot(x,y)
matplotlib plot function for x, y
plt.plot(x, y, label='My Plot’, linewidth=2.0)
KNN is ___ __________ _________
non parameter learning
non-parameter learning y = _____
parameter learning y = ____
f(X, X_train); f(X,W)
non-parameter learning needs the _____ ________ ________ and is very slow in ______ with almost no ______ ________
entire training dataset; inferring; training process
similar to using a dictionary to find definitions or synonyms
non-parameter learning
parameter learning requires the ____, is very _____ in _______, but takes more ______ in _______
weight; fast; inferring; time; training
similar to having the word in your brain to recognize it at once
parameter learning
gives you the ground truth
loss function
common loss function
Loss(y, y^) = sum(y-y^)²
with different combinations of theta0 and theta1, we obtain different ______ ______, it is a ____ surface
loss values; 3D
loss value shows how close your _________ __________ ___________ is to the ________ ________
machine learning algorithm; ground truth
for a loss value, the _______ the ________
lower; better
machine learning aims to find the best ________ that ____ ________ could obtain the _______ value
parameters; loss function; lowest
how do we get the smallest loss value
gradient descent
each step of gradient descent uses all of the training examples - this is known as …
batch gradient descent
your step size in gradient descent is known as the _________ _____
learning rate
output is decrete in _________
classification
output is continuous in _________
regression
machine learning is a ____-______ approach
data driven
data: any __________ fact, value, text, sound, or picture not being _______ and __________
unprocessed; interpreted; analyzed
a set of data collected for machine learning based task
dataset
a set of data used to discover predictive relationships
training dataset
a set of data used to asses the strength and utility of a predictive relationship
test dataset
the attributes to each data sample
features
KNN stands for:
k nearest neighbors
for KNN:
calculate the ____ _________ for every _____ _______
select the ____ data points with the _________ _________
________ based on the k point (new data point should belong to same category as the _______ )
L-2 distance; data point; K; smallest distance; voting; majority
can you still use KNN if there is more than one feature for distance calculations?
yes
when setting up KNN, you can choose two parameters:
the best ______ of ___ for ________
the best _______ for ________
value; k; voting; distance; measuring
the parameters you set of KNN are known as _____________ and are not ________ by the machine learning _______ itself
hyperparameters; adapted; algorithm
a set of examples used to tune the hyperparameters
validation dataset
never use _____ data to _____ _______
test; train model
cross validation: when dataset is ______, ______ data, try each fold as _______ and _______
small; split; validation; average
cross validation is _________ in deep learning
uncommon
learning from labeled examples
supervised learning
draw from inferences from datasets consisting of input data without labeled responses
unsupervised learning
supervised learning has pairs with an ______ object and a desired ______ value
input; ouput
unsupervised learning finds ______ ________ or _________ in data
hidden patterns; grouping
K-Means Algorithm:
initialize ____ _______ _______
assign _____ ______ to ________ clusters
update _______ _________ by calculating _________
repeat ___ and ___ until _________
select optimal number of ________
K center centroids; data points; nearest; center centroids; average; 2; 3; convergence; clusters
non-parameter learning requires computation of all of the _______ ________, taking more ______ and ________
training dataset; time; memory
non-parameter/parameter, supervised/unsupervised
KNN:
K-Means:
Linear Regression:
non-parameter, supervised; non-parameter, unsupervised; parameter, supervised
KNN and K-Means are __________ tasks whereas linear regression is a _________ task
classification; regression
linear regression steps
propose model; gradient descent; get parameters and test
image recognition is _______; stock price prediction is ________
classification; regression
softmax classifier: build upon ________ _________; _____ score of class k to __________ of being in this class; __________ of being in different classes sum up to ____
linear classification; map; probability; probabilities; 1
loss over the dataset is the _________ ______ for all _________
average loss; examples
three loss functions
MAE; MSE; Cross Entropy
MAE: ______ ________ __________
Equation:
mean absolution error; abs(y^ - y)
MSE: ______ _________ _________
Equation:
mean square error; (y^ - y)²
Cross Entropy is the _________ _____ likelihood of the __________ ________ as the loss
negative log; correct class
cross entropy for the following:
true label: [1 0 0 0 0]
softmax: [0.1 0.5 0.1 0.1 0.2]
-(1*log(0.1))+(0*log(0.5))+(0*log(0.1))+(0*log(0.1))+(0*log(0.2))
Regularization:
- it is likely different ___ has the same _____
- regularization helps to _______ ________ and avoid _________
W; loss; express preference; overfitting
L(W) including regularization
L(W) = data loss + regularization
overfitting: model tries to fit not only the __________ relation between _____ and ______ but also the _______ ________; ________ ______________ helps select simple models
regular; inputs; outputs; sampling errors; weight regularization
numerical gradient: __________, ______, easy to _____
analytic gradient: ______, _______, _______ prone
—> in practice we use _______ but check with _________
approximate; slow; write; exact; fast; error; analytic; numerical
with backpropogation, given f(x, y, z), you’ll end up getting which derivatives
df/dx; df/dy; df/dz
in backpropogation, multiply the _________ by the ______ ___________
upstream; local gradient
tool used for forward and back propogation
computational graph
the local gradient is the _________
derivative
the input to the local gradient is found from __________-____________
forward-propogation
current gradient =
local gradient * upstream gradient
what do we assume to begin back propogation if forward not clear
2
the input layer for a neural network
the first layer
the output layer for a neural network
the last layerl
layers in between input and output layers of neural networks
hidden layers
neurons between ________ layers are typically connected, neurons _______ the ______ layer are not connected
adjacent; within; same
the input layer of a neural network is _________ meaning the ________ is the input
transparent; output
two parts in the neurons of hidden layer
accumulation of product; activation function
connection among neurons has a _______ and it is the parameter that should be ________
weight; learned
which layer has a “special” activation function
output
squashes numbers [0, 1] and is popular, used in RNN
sigmoid
squashes numbers [-1,1] and zero-centered, used in RNN
tanh
squashes number [0, infinity] and does not saturate, used in CNN and FCN
RELU
squashes numbers [-infinity, infinity], does not saturate
leaky RELU
for choosing activation functions, typically choose _____, then _____ _______ if that doesn’t work; sometimes _____, not normally _______
RELU; leaky RELU; tanh; sigmoid