Data Mining - Final

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/14

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

15 Terms

New cards

What is the motivation behind an activation function?

Map linear values into a non-linear space

New cards

What are the 5 steps of DNN training?

Randomly initialize Weights W and biases b
Calculate feed forward to estimate output y hat
Back propagation and compute gradient dW and db
Update W and b using gradient descent
Repeat steps 2, 3, and 4 until convergence

New cards

What are the nonlinear activation functions and their output ranges?

Sigmoid [0, 1]

Tanh [-1, 1]

ReLU [0, inf]

Leaky ReLU [-inf, inf]

ELU

New cards

What is underfittnig?

Model is too simple, too high of error in both training and test data

New cards

What is overfitting?

Statistical model exactly fits training data

New cards

What is dropout and what is its purpose?

Removing neuron from model to prevent overfitting

New cards

What is ensemble?

Instead of dropout, multiple networks are trained at the same time and then their results are counted as votes toward the final decision

New cards

What is batch normalization?

In each batch, each feature is normalized across the batch. So the Z-score relates to the z-scores of the same feature across all samples.

New cards

What are the advantages of batch normalization?

Speeds up training

Handles internal covariant shift

Makes training stable

New cards

What is layer normalization?

Each sample is normalized across the features. So the z-score relates to the z-score of the same sample across all features

New cards

What are the advantages of layer normalization?

Removes dependency on batches of data samples

Easier to apply to neural networks

New cards

What is minibatch and what is it’s purpose?

Purpose: more efficient/faster training

If we have a huge data set, say n=100,000, we partition it into m batches, where one batch is used per iteration of training to speed things up

New cards

What is learning rate?

The multiplier used for the gradient to compute how far we should move in the gradient direction. This changes over time as we get closer to convergence

New cards

What is the advantage of a convolutional neural network over an MLP?

The MLP may destroy spatial information during vectorization, whereas the CNN maintains spatial information

New cards