1/65
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Which of the following is one of the three components of Machine Learning frameworks?
Experience
3 multiple choice options
Feldman and Ballard's (1982) 100-step constraint hypothesis suggested that neural networks must calculate mental events:
in a highly parallel mannar
3 multiple choice options
What was the main discovery which led to renewed interest in neural networks in the 1980s?
Backpropagation Learning Algorithm (Rumelhart and McClelland)
3 multiple choice options
Biological neurons communicate chemically using:
Neurotransmitters
3 multiple choice options
Biological neurons communicate electrically using:
Action Potentials
3 multiple choice options
The input layer of a neural network utilizes a threshold activation function to calculate it's output values.
False
1 multiple choice option
Artificial neural network architectures can typically be divided into two types/categories. Which of the following is one of those categories?
Recurrent
3 multiple choice options
Biological neurons are connected together via synapses. For artificial neurons, synapses are modeled by connection weights. Which of the statements below is true of connection weights?
They can take on negative values to model inhibitory interactions
3 multiple choice options
Artificial neural networks can be used to approximate any computable
function (to an arbitrary degree of precision).
True
1 multiple choice option
In the equation netj = w0 + En i=1 xi wij the term w0 represents what in this equation.
Bias Weight
3 multiple choice options
There are certain types of problems that can be solved using a single-layer neural network, but Hebbian Learning fails to find the correct connection weights for many of them.
True
1 multiple choice option
Which activation function is commonly used for classification problems?
Softmax
3 multiple choice options
What kind of encoding is typically used to represent discrete data values for neural networks?
One-Hot
3 multiple choice options
Learning curves for neural networks may be obtained by plotting what values for each epoch of training?
Loss
3 multiple choice options
Single layer networks for binary classification problems can have either one or two output units.
True
1 multiple choice option
Binary data values are often recoded using which kind of representation?
Bipolar
3 multiple choice options
The Generalized Delta Rule (or Error Backpropagation) calculates updates for a connection weight by using the product between the delta term on the sending side and the activation term on the receiving side.
False
1 multiple choice option
The Error Backpropagation algorithm was developed utilizing which mathematical principle?
Chain Rule
3 multiple choice options
The forward pass through a neural network performs a prediction
process.
True
1 multiple choice option
Inputs to a neural network which are set to zero can impact the learning process by:
Preventing local weight updates
3 multiple choice options
What major problem develops when using Backpropagation for neural networks with many stacked layers?
Vanishing Gradient
3 multiple choice options
What kind of hidden layer activation function cannot be used to solve non-linear problems?
Linear
3 multiple choice options
Stacking very many hidden layers will result in which learning phenomenon when using Error Backpropagation?
Degradation
3 multiple choice options
What kind of activation function is required by hidden layer units to perform non-linear transformations?
Non-linear
3 multiple choice options
Deep residual networks typically train up slower than wide networks.
False
1 multiple choice option
The number of parameters in a neural network influences which property of the network?
Capacity
3 multiple choice options
The Sum Squared Error function is often used when solving classification problems (e.g. the Iris problem) with neural networks.
False
1 multiple choice option
Which technique is used to correct the direction which weight updates take during the optimization process (make them travel in the direction of the minimum instead of curving around towards it)?
Momentum
3 multiple choice options
Which technique is used to correct both the direction and speed which weight updates take during the optimization process (make them travel in the direction of the minimum instead of curving around towards it and make them travel more quickly on flat regions and more slowly on steep regions)?
Adam
3 multiple choice options
Why are approximate solutions like momentum, Adagrad, and/or Adam, used to help speed up the gradient descent process instead of the Hessian (i.e. the exact solution)?
The Hessian is too large to fit in memory
3 multiple choice options
What is the name of the neural network architecture developed in 1980 which was the first serious attempt at creating a successful convolution architecture?
Neocognitron
3 multiple choice options
Conv2d layers are used to process 3-dimensional input tensors.
True
1 multiple choice option
Residual (skip) connections cannot be used with convolution architectures.
False
1 multiple choice option
What is the primary goal of regularization methods?
Preventing Overfitting
3 multiple choice options
Which of the following is the most-used, task-agnostic form of regularization?
Dropout (DO)
3 multiple choice options
Besides convolution, what other strategy can be used to encourage a neural network to develop translation-invariant representations?
Data Augmentation
3 multiple choice options
Using a pre-trained convolution network (transfer learning) can have one of two possible outcomes on network performance: positive or negative.
False
1 multiple choice option
Which concept below is most closely related to the concept of checkpointing in deep learning?
Persistence
3 multiple choice options
Which of the following is the name of one of the pretrained convolution architectures that we explored in class?
ResNet
3 multiple choice options
What problem arises when a typical neural network needs to learn just one task at a time?
Catastrophic Interference
3 multiple choice options
Unlike Feed-Forward networks, Recurrent networks experience the True
exploding gradient problem.
True
1 multiple choice option
Which of the following is not a type of gate found in Long Short-term Memory (LSTM) neural units?
Repeat Gate
3 multiple choice options
Image captioning is an example of which kind of task?
One-to-Many
3 multiple choice options
When applying feed-forward networks to time-series data/tasks/ problems, we typically will expect poor performance when which of the following assumptions is not met?
Markov Assumption
3 multiple choice options
Which encoding method is appropriate for providing discrete input values (integers, letters, words, etc.) to neural networks?
Random Embeddings
3 multiple choice options
Our English-Portuguese Encoder-Decoder models were trained without using teacher forcing.
False
1 multiple choice option
What kind of neural units were commonly used in older attention- learning neural networks?
Radial Basis
3 multiple choice options
What property of sequential data allows bidirectional recurrent layers to potentially perform better than unidirectional recurrent layers?
Data from the future can often be used to predict data from the past
3 multiple choice options
We can use bidirectional recurrent layers in the decoder component of an Encoder-Decoder architecture.
False
1 multiple choice option
Which attention mechanism is most commonly used in deep learning architectures?
Self-Attention
3 multiple choice options
The self-attention mechanism used in Transformers calculates similarities between query embeddings and key embeddings using which function?
Dot Product
3 multiple choice options
Which is appropriate to use in transformer blocks for the decoder component of an encoder-decoder network?
Causal Masking
3 multiple choice options
Which of the following allows transformers to process longer sequences than those used during training?
Sinusoidal Position Encodings
3 multiple choice options
Generative Pretrained Transformers (GPTs) are pretrained using supervised learning approaches.
False
1 multiple choice option
The Vision Transformer architecture does not require the use of position encodings.
False
1 multiple choice option
What is the name of the contrastive learning framework that demonstrated how unsupervised learning could be as accurate as supervised learning?
SimCLR
3 multiple choice options
The initials GPT stand for:
Generative Pretrained Transformer
3 multiple choice options
Contrastive learning may be performed unsupervised or supervised.
True
1 multiple choice option
Contrastive learning performs best when using minimal data augmentation.
False
1 multiple choice option
What major issue were Diffusion Models developed to prevent?
Mode Collapse
3 multiple choice options
Reinforcement learning is used to solve tasks where feedback/ reward from the supervisor is sparse in space, but not sparse in time.
False
1 multiple choice option
In reinforcement learning tasks, which function is an estimate of the total future worth of any particular state of the environment?
Value Function - V(s)
3 multiple choice options
What hyperparameter is used in reinforcement learning to ensure that rewards which occur in the short-term are worth more than rewards which occur in the long-term?
Discount Factor (γ)
3 multiple choice options
Reinforcement learning with neural networks typically requires strategies to prevent unstable learning caused by:
Catastrophic Interference
3 multiple choice options
Successful alignment of GPT models required a new strategy for learning which of the following functions?
Reward Function - r(s)
3 multiple choice options