Neural Networks - Final Test

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/65

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

66 Terms

New cards

Which of the following is one of the three components of Machine Learning frameworks?

Experience

3 multiple choice options

New cards

Feldman and Ballard's (1982) 100-step constraint hypothesis suggested that neural networks must calculate mental events:

in a highly parallel mannar

3 multiple choice options

New cards

What was the main discovery which led to renewed interest in neural networks in the 1980s?

Backpropagation Learning Algorithm (Rumelhart and McClelland)

3 multiple choice options

New cards

Biological neurons communicate chemically using:

Neurotransmitters

3 multiple choice options

New cards

Biological neurons communicate electrically using:

Action Potentials

3 multiple choice options

New cards

The input layer of a neural network utilizes a threshold activation function to calculate it's output values.

False

1 multiple choice option

New cards

Artificial neural network architectures can typically be divided into two types/categories. Which of the following is one of those categories?

Recurrent

3 multiple choice options

New cards

Biological neurons are connected together via synapses. For artificial neurons, synapses are modeled by connection weights. Which of the statements below is true of connection weights?

They can take on negative values to model inhibitory interactions

3 multiple choice options

New cards

Artificial neural networks can be used to approximate any computable

function (to an arbitrary degree of precision).

True

1 multiple choice option

New cards

In the equation netj = w0 + En i=1 xi wij the term w0 represents what in this equation.

Bias Weight

3 multiple choice options

New cards

There are certain types of problems that can be solved using a single-layer neural network, but Hebbian Learning fails to find the correct connection weights for many of them.

True

1 multiple choice option

New cards

Which activation function is commonly used for classification problems?

Softmax

3 multiple choice options

New cards

What kind of encoding is typically used to represent discrete data values for neural networks?

One-Hot

3 multiple choice options

New cards

Learning curves for neural networks may be obtained by plotting what values for each epoch of training?

Loss

3 multiple choice options

New cards

Single layer networks for binary classification problems can have either one or two output units.

True

1 multiple choice option

New cards

Binary data values are often recoded using which kind of representation?

Bipolar

3 multiple choice options

New cards

The Generalized Delta Rule (or Error Backpropagation) calculates updates for a connection weight by using the product between the delta term on the sending side and the activation term on the receiving side.

False

1 multiple choice option

New cards

The Error Backpropagation algorithm was developed utilizing which mathematical principle?

Chain Rule

3 multiple choice options

New cards

The forward pass through a neural network performs a prediction

process.

True

1 multiple choice option

New cards

Inputs to a neural network which are set to zero can impact the learning process by:

Preventing local weight updates

3 multiple choice options

New cards

What major problem develops when using Backpropagation for neural networks with many stacked layers?

Vanishing Gradient

3 multiple choice options

New cards

What kind of hidden layer activation function cannot be used to solve non-linear problems?

Linear

3 multiple choice options

New cards

Stacking very many hidden layers will result in which learning phenomenon when using Error Backpropagation?

Degradation

3 multiple choice options

New cards

What kind of activation function is required by hidden layer units to perform non-linear transformations?

Non-linear

3 multiple choice options

New cards

Deep residual networks typically train up slower than wide networks.

False

1 multiple choice option

New cards

The number of parameters in a neural network influences which property of the network?

Capacity

3 multiple choice options

New cards

The Sum Squared Error function is often used when solving classification problems (e.g. the Iris problem) with neural networks.

False

1 multiple choice option

New cards

Which technique is used to correct the direction which weight updates take during the optimization process (make them travel in the direction of the minimum instead of curving around towards it)?

Momentum

3 multiple choice options

New cards

Which technique is used to correct both the direction and speed which weight updates take during the optimization process (make them travel in the direction of the minimum instead of curving around towards it and make them travel more quickly on flat regions and more slowly on steep regions)?

Adam

3 multiple choice options

New cards

Why are approximate solutions like momentum, Adagrad, and/or Adam, used to help speed up the gradient descent process instead of the Hessian (i.e. the exact solution)?

The Hessian is too large to fit in memory

3 multiple choice options

New cards

What is the name of the neural network architecture developed in 1980 which was the first serious attempt at creating a successful convolution architecture?

Neocognitron

3 multiple choice options

New cards

Conv2d layers are used to process 3-dimensional input tensors.

True

1 multiple choice option

New cards

Residual (skip) connections cannot be used with convolution architectures.

False

1 multiple choice option

New cards

What is the primary goal of regularization methods?

Preventing Overfitting

3 multiple choice options

New cards

Which of the following is the most-used, task-agnostic form of regularization?

Dropout (DO)

3 multiple choice options

New cards

Besides convolution, what other strategy can be used to encourage a neural network to develop translation-invariant representations?

Data Augmentation

3 multiple choice options

New cards

Using a pre-trained convolution network (transfer learning) can have one of two possible outcomes on network performance: positive or negative.

False

1 multiple choice option

New cards

Which concept below is most closely related to the concept of checkpointing in deep learning?

Persistence

3 multiple choice options

New cards

Which of the following is the name of one of the pretrained convolution architectures that we explored in class?

ResNet

3 multiple choice options

New cards

What problem arises when a typical neural network needs to learn just one task at a time?

Catastrophic Interference

3 multiple choice options

New cards

Unlike Feed-Forward networks, Recurrent networks experience the True

exploding gradient problem.

True

1 multiple choice option

New cards

Which of the following is not a type of gate found in Long Short-term Memory (LSTM) neural units?

Repeat Gate

3 multiple choice options

New cards

Image captioning is an example of which kind of task?

One-to-Many

3 multiple choice options

New cards

When applying feed-forward networks to time-series data/tasks/ problems, we typically will expect poor performance when which of the following assumptions is not met?

Markov Assumption

3 multiple choice options

New cards

Which encoding method is appropriate for providing discrete input values (integers, letters, words, etc.) to neural networks?

Random Embeddings

3 multiple choice options

New cards

Our English-Portuguese Encoder-Decoder models were trained without using teacher forcing.

False

1 multiple choice option

New cards

What kind of neural units were commonly used in older attention- learning neural networks?

Radial Basis

3 multiple choice options

New cards

What property of sequential data allows bidirectional recurrent layers to potentially perform better than unidirectional recurrent layers?

Data from the future can often be used to predict data from the past

3 multiple choice options

New cards

We can use bidirectional recurrent layers in the decoder component of an Encoder-Decoder architecture.

False

1 multiple choice option

New cards

Which attention mechanism is most commonly used in deep learning architectures?

Self-Attention

3 multiple choice options

New cards

The self-attention mechanism used in Transformers calculates similarities between query embeddings and key embeddings using which function?

Dot Product

3 multiple choice options

New cards

Which is appropriate to use in transformer blocks for the decoder component of an encoder-decoder network?

Causal Masking

3 multiple choice options

New cards

Which of the following allows transformers to process longer sequences than those used during training?

Sinusoidal Position Encodings

3 multiple choice options

New cards

Generative Pretrained Transformers (GPTs) are pretrained using supervised learning approaches.

False

1 multiple choice option

New cards

The Vision Transformer architecture does not require the use of position encodings.

False

1 multiple choice option

New cards

What is the name of the contrastive learning framework that demonstrated how unsupervised learning could be as accurate as supervised learning?

SimCLR

3 multiple choice options

New cards

The initials GPT stand for:

Generative Pretrained Transformer

3 multiple choice options

New cards

Contrastive learning may be performed unsupervised or supervised.

True

1 multiple choice option

New cards

Contrastive learning performs best when using minimal data augmentation.

False

1 multiple choice option

New cards

What major issue were Diffusion Models developed to prevent?

Mode Collapse

3 multiple choice options

New cards

Reinforcement learning is used to solve tasks where feedback/ reward from the supervisor is sparse in space, but not sparse in time.

False

1 multiple choice option

New cards

In reinforcement learning tasks, which function is an estimate of the total future worth of any particular state of the environment?

Value Function - V(s)

3 multiple choice options

New cards

What hyperparameter is used in reinforcement learning to ensure that rewards which occur in the short-term are worth more than rewards which occur in the long-term?

Discount Factor (γ)

3 multiple choice options

New cards

Reinforcement learning with neural networks typically requires strategies to prevent unstable learning caused by:

Catastrophic Interference

3 multiple choice options

New cards

Successful alignment of GPT models required a new strategy for learning which of the following functions?

Reward Function - r(s)

3 multiple choice options

New cards