1/205
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are hidden layers, and why are they useful for performing complex tasks such as recognizing handwritten digits?
The hidden layers are the intermediate layers between the input and output layers of an artificial neural network. They help to abstract and process information, aiding the network to learn complex patterns (such as the components of digits) for recognition.
How do activations in one layer influence activations in the next of a neural network?
The activations in one layer determine those of the next through weighted connections between neurons. The weighted sum of neuron activations from one layer is passed through a function (like sigmoid or ReLU) to activate neurons in the following layer.
Why assign weights to connections between neurons in the network?
Weights in this case determine the influence one neuron has on another in the next layer. For a network to "learn" it can adjust these weights to modify the network to recognize certain features (such as lines or loops for hand-drawn number identification) and associate them with certain numbers.
What are similarities and differences between the sigmoid and ReLU activation functions in neural networks?
Sigmoid activation functions map input values to a range of 0 to 1 (1/(1+e^(-x))). This was originally used in most artificial neural networks, but later it was learned to be computationally more expensive than ReLU. ReLU outputs values from 0 to positive infinity (max(0,x)), and due to allowing for fewer activations, it prevents overfitting in some models and has faster training compared to sigmoid.
How does a neural network function when recognizing images, such as numbers?
The network splits the image into bits and pieces based on activation number, then based on that attempts to recognize patterns that those pieces form, before recognizing the full image
What is the purpose of the weights in the grid to the neural network?
To determine what pixel pattern is being picked up on
What does each "neuron" in a neural network represent?
A function that takes in the information from the neurons before it
What is the ReLu function, and how does it apply to modern-day machine learning?
f(x) = max(0, x) : typically used in modern-day machine learning
How does the sigmoid function work?
The sigmoid function compresses the weighted sum into an output between 0 and 1. In this function, inputs that are positive get closer to 1, and inputs that are negative get closer to 0.
What is a bias, and in what situations would someone want to use one?
The bias shows how high the weighted sum has to be before the neuron gets meaningfully active. Someone might want to use a bias if they are putting constraints on when a neuron will become active, and when a neuron will become inactive.
How does "learning" work when it comes to neural networks?
A neural network "learns" when it adjusts the weights and biases until it finds the right one so that it can solve whatever problem is being faced.
How do you calculate the amount of weights and biases that can be used to alter the way the neural network behaves?
Weights: # weights in layer 1 # weights in layer 2 + # weights in layer 2 # weights in layer 3 + ... + # weights in layer n-1 * # weights in layer n. Biases: # biases in layer 2 + # biases in layer 3 + ... + # biases in layer n
What makes recognizing handwritten digits challenging for a computer?
Since there are a ton of different arrangements of values of pixels that we would consider a 3, for example, instead of one clearly correct answer, it is difficult for the computer to tell which number it is supposed to be. If one pixel differs from what it was taught, the computer could get the digit wrong.
What effect do "neurons" in one layer have on neurons in the next layer?
The following layer receives the activation levels from neurons via weighted sums, after which they undergo a squishing function to convert them into values between 0 and 1.
What role does the matrix-vector product have in a neural network's architecture?
The neural network's ability to quickly process large amounts of data is attributed to its ability to compute the weighted sums of activations between layers efficiently using the matrix-vector product.
What are the benefits of properly understanding the weights and biases in a neural network?
By looking at the values and trying to understand what the computer is really doing, you can debug the neural network more effectively then if you were unaware of what the weights and biases were doing. Examining the weights and biases can also reveal further information about how the code works if it executes correctly but not in the way that is expected.
What are activations and what do they represent in the first layer of the video?
An activation is the number that is inside a neuron. It ranges from 0 to 1 and, in the first layer, represents the grayscale value of the pixel its neuron corresponds to where 0 is the darkest and 1 is the brightest.
What do the activations of the last layer of the video represent?
There are ten neurons (each representative of the numbers 0 to 9)in the last layer and respectively ten activations. The activations still range from 0 to 1, but whereas the first set of activations represents a grayscale value, these activations represent the accuracy of how well the image corresponds to each numbered neuron.
How are edges detected?
There is likely an edge present where bright pixels are surrounded by darker pixels. This can be calculated with weight sums where negative weight surrounds the pixels.
What was the purpose of using the sigmoid function for this video?
This specific neural network was set up so that the activation values ranged from 0 to 1. However, when the weighted sums were calculated, the resulting numbers escaped this range. Using the sigmoid function transformed the weighted sums so that they fell in the 0 to 1 range while still being representative of the same numbers to some capacity, just scaled smaller.
What is the mathematical expression used to calculate the an+1-th layer using matrices and vectors?
an+1=σ(Wnan+bn) or a(n+1)=σ(Wa(n)+b), with σ representing the sigmoid function, Wn representing the weights, bn representing the biases and an representing the neural networks' previous layer's values.
Why is ReLU now the preferred function as opposed to the sigmoid function to provide a value for neural network layers? What are its advantages and relations to humans?
ReLU is a model that is easier to train for neural networks as opposed to the sigmoid function, as it is similar to how human neurons either are activated or inactive and can only send a strong signal or no signal at all to future neurons. Additionally, it is used as a simplification as opposed to the large range of numbers that the sigmoid function provides instead of the "on or off" idea behind ReLU.
What do the weights and biases do to help the neural network learn?
Weights are used by the neural network to judge how important certain information is or is not and biases are used so that the neurons in the neural network only fire after a certain amount of information and weights provide significant evidence of the neuron needing to be activated. These weights and biases are adjusted in the learning process to help the neural network learn and become more accurate.
Why is it more accurate to think of a neuron as a function as opposed to a thing that holds a number?
Each neuron gets information from the neurons before it using a mathematical formula, and spits out a number that makes it seem to just be a "thing that holds a number" although the neuron is really using a calculation to make that number and then feeding that information to another neuron or as the solution.
What role does the adjustment of weights play in improving the neural network's performance?
Adjusting the network's weights through backpropagation allows it to learn from its errors. Continuously changing the weights based on the error between its prediction and the actual label makes the model better.
How does increasing the number of layers affect the performance of a neural network?
More layers can allow a neural network to extract higher-level features at each layer, more specific than small curved segments that were analyzed in the original 3 layers of the digit recognition model. However, larger networks need more data to prevent overfitting and demand more computational resources for training.
How do the neurons in the input layer of the neural network represent the image?
The neurons in the input layer of the neural network represent the image by holding numerical values between 0 and 1, corresponding to the grayscale intensity of each pixel in the grid.
What is the significance of the bias term added to the weighted sum before applying the sigmoid function?
The bias is added to the weighted sum to adjust the threshold at which a neuron becomes activated. Without the bias, a neuron might activate too easily or not at all. By adding a positive or negative bias, the network can control how sensitive the neuron is to its inputs.
What is the number inside a neuron called?
Activation
How do the "hidden layers" function and how do they work together?
When certain neurons are activated they activate other neurons in the next layer. They get activated when the activation number reaches a certain threshold based on the bias and weighted sum. They can reach this number when certain pixels line up in a similar way (curve, horizontal line, vertical line) and location (top, middle, bottom, left, right).
True or False, is the Sigmoid Function the preferred training function?
False, while the Sigmoid Function was originally the main function used, ReLU is preferred now as it is easier and faster to train on.
What is the value of using biases and weights?
By using biases and weights the neural network can more accurately "understand" how the data looks like. By adding biases it stops an uncommon piece of data from being shown often unless the network is confident about it. By adding weights it helps remove pieces of information that don't appear with other pieces of data (loops not appearing in numbers like 1 and 7).
Why are sigmoid functions used in neural networks?
Sigmoid functions are used in neural networks because they squash input values to an output between 0 and 1. This output can be interpreted as the model's confidence in or the probability of a certain outcome. A threshold can also be applied to this prediction, allowing for a classification output. Outputs generated by sigmoid functions also determine the value of neurons in hidden layers and the output layer, allowing for further calculations of weighted sums.
What are the three main types of layers in a neural network, and what do their weights mean?
The three main types of layers in a neural network are: Input Layer: Receives the input and assigns each neuron a value, typically between 0 and 1, to describe an attribute of the data. For example, each neuron in the input layer of an image-classifying neural network may correspond to a specific pixel in the inputted image, and its value may describe that pixel's brightness. Hidden Layers: Perform computations by assigning weights to inputs and applying the sigmoid function on a weighted sum. The weights represent the importance of each connection and are adjusted during training to improve accuracy. Output Layer: Produces the final prediction. The weights here determine how the network produces an output, such as class probability in a classification task.
What are biases, and why are they used in sigmoid functions?
Biases in neural networks are additional parameters added to the weighted sum of the inputs to each neuron. Biases shift the sigmoid function to shift left or right, thus adjusting the outputs of the function, or the value assigned to that neuron. This helps the model be more adaptable, flexible, and thus more accurate.
How are the neurons in a neural network similar to the neurons in our brain, and how are they different?
Neurons in neural networks are similar to neurons in the brain in that both process information by receiving inputs, performing computations, and passing signals to other neurons. In both systems, the strength of the connection between neurons (weights in artificial neurons, synapses in biological neurons) determines how influential one neuron is to another. However, unlike biological neurons that fire once a certain threshold is reached, neurons in neural networks do not have a binary firing mechanism. Instead, they compute a weighted sum of their inputs, and this value is passed through an activation function like the sigmoid, which outputs a value between 0 and 1.
What type of neural network is good for image recognition?
Convolutional neural networks
What does the activation of a neuron tell you about it and the pixel it is associated with in this example of a neural network?
The activation of a neuron, which ranges from 0 to 1, reveals how "lit up" the pixel it is associated with is. This value has to do with the grayscale value of the given pixel.
Why are neural networks called neural networks?
Neural networks function in a similar way to the brain. They have "neurons" that fire when given a certain stimulus. In this case, when the neurons activate, they continue to activate neurons in the next layer of the network. This is similar to how certain groups of neurons in the brain cause other groups of neurons to fire.
What is the benefit of representing the activations and the biases as vectors and the weights as a matrix?
You can find the matrix-vector product of the weights and the activations more easily, because libraries have optimized matrix multiplication. This simplifies the whole process, and makes the function more comprehensible.
What does the activation of a neuron in a neural network represent?
A value between 0 and 1 that indicates the neuron's activity
What is the primary function of the hidden layers in a neural network?
Process and detect patterns like edges and loops from the input image
What is the role of weights and biases in determining how a neuron responds to inputs?
Weights and biases control how much influence each input has on the neuron's activation. The weights multiply each input, and the bias shifts the neuron's threshold for activation, allowing the neuron to detect specific patterns in data and allow the network to respond to complex inputs.
What is the primary function of the neurons in the first layer of the neural network?
To hold and represent the grayscale values of each pixel in the input image
Why is there an additional constant in addition to each linear combination?
It builds in an inherent bias which can't be represented by weights alone.
Why are sigmoid functions not used in NNs?
For extreme values their slope is zero, since this makes those neurons insensitive to inputs.
What do we know about what each layer in a NN does in the context of processing an input?
Although we can predict what it could do, we have no idea.
What mathematical operation is the calculating of weights very similar to?
It's a dot product.
If you represented the equation of the activation of a single neuron using a rectified linear activation function would most accurately be which of the following:
f(x) = max(0, ax+b) where f(x) represents the output of the neuron, max(..) represents the rectified linear function, a represents the weight of the neuron and b represents the bias.
True or false: The bias of a neuron is a value associated with connection between itself to other neurons.
False; the bias of a neuron applies to the neuron independently of its connections and represents the threshold that affects the ease of activation of the neuron.
If we suppose a neural network that takes an RGB image of size 400x400, and should output if the image is a cat, dog, or fish, what are the sizes of the first and last layers of the neural network?
The input layer will have 400x400x3 neurons; for each pixel there will be three values between 0 and 1 to represent the RGB value of the pixel and there will be 400^2 pixels. The output layer will have 3 neurons to represent the model's confidence in the image as a cat, dog or fish.
True or false: A more negative neuron activation will generally represent a lower activation of that neuron.
False; A negative neuron activation still represents a high neuron activation and will likely affect neurons in the next layer.
What function helps activations be in the range of 0-1?
Sigmoid Function
How would you make it so the activation range would be greater than a certain number? What is this number called?
Subtract n from the weighted sum before using the sigmoid function; bias
What is "learning"?
Finding the right weights and biases
What is the simplified function of the full translation of activations from one layer to the next?
a^(1)= sigmoid(Wa^(0)+b)
Why did Sigmoid get replaced with ReLu?
Sigmoid got very difficult to train, but ReLu worked instead
How do hidden layers contribute to solving complex tasks, like recognizing handwritten digits?
Hidden layers act as intermediaries between the input and output layers in a neural network, so they process the data by extracting important features. This enables the network to recognize intricate patterns, such as the curves and the (easily distinguishable) shapes in handwritten digits. So something like 9, with the top circle and bottom curve that looks like a log graph.
What role do the weights in a neural network's grid play?
Weights help determine which pixel patterns are being detected by the network, and which is most relevant. It helps because it reduces the amount of noise/faulty detection
How does matrix-vector multiplication enhance a neural network's processing capabilities?
It enables efficient computation of weighted sums between layers. This allows the network to quickly process huge sets of data, which is essential for making accurate predictions in tasks like image recognition within a reasonable amount of time.
What is the advantage of using ReLU instead of the sigmoid function in modern machine learning?
Because it simplifies the learning process by allowing faster training and it also avoids the computational complexity of the sigmoid function. It activates neurons only when necessary, making the network more efficient. It's the same idea behind optimized games where only the part the player can see is rendered.
What is a neural network and its purpose?
A neural network is a computational model inspired by the brain, and is used to recognize patterns and solve tasks like classification by learning from data.
How do neural networks learn from data?
Neural networks learn through training, adjusting weights between neurons using algorithms like backpropagation to minimize errors.
What role do hidden layers play in neural networks?
Hidden layers enable the network to learn complex features by processing and transforming inputs through multiple stages.
Why is backpropagation essential in training neural networks?
Backpropagation helps update the weights in the network, allowing it to learn from mistakes and improve accuracy over time.
What is a cost function, and how is it calculated and used to evaluate the overall performance of an artificial neural network?
A cost function measures the difference between predicted and actual outputs that a neural network can return. It aggregates the errors across all of the training examples to provide a measure of performance. By summing the "costs" for these examples, it provides a more stable and generally accurate measure of performance which can then help the model optimize its training process.
How does backpropagation relate to the gradient descent algorithm in neural networks?
Backpropagation is a core aspect of the neural network training process because it allows the network to learn from its errors, updating the neurons' weights and biases based on the result of the optimization algorithm (in this case, the gradient descent). This gradient descent shows what direction each weight and bias should be adjusted to in order to minimize the error (cost) and then is implemented by backpropagation (in other words, finding the local minimum/direction of the steepest descent).
Why might a neural network classify random noise with just as high confidence as other data, and what does this imply about its learning process?
When a neural network classifies random noise or inputs with overconfidence in its predictions, it likely means that it has been exposed to a lack of diverse training scenarios, showing that it can memorize the patterns in the data without truly understanding them. This is also known as overfitting and means that the network is not able to express uncertainty well.
Why do artificial neurons have continuously ranging activations?
The cost function has a smooth output as it takes small steps to find a local minimum. As you continue running the cost function, the weights are constantly being updated.
How do neural networks learn with randomly-labeled data compared to properly-labeled data? Why?
Neural networks learn with properly-labeled data much faster than with randomly-labeled data. In the optimization landscape, the local minima are around the same quality, which makes it easier to learn with a structured data set.
How does the neural network function discussed in the first video differ from the cost function that better explains how neural networks learn?
In the original function, the inputs were the pixels, and the parameters were the weights/biases. In the cost function, the inputs are the weights/biases, the output is the cost, and the parameters are examples the network is trained on.
What is the formula for the cost function in machine learning? Why is the function written this way? a)(computer output - actual output); this function more severely punishes mistakes that the system makes b)(computer output + actual output); this function rewards inaccuracies that the system makes c)(computer output - actual output)2; this function more severely punishes mistakes that the system makes d)(computer output - actual output) * 2; this function severely punishes when the system does not have an answer
C.
What is the purpose of gradient descent when trying to "teach" a machine? a)To get the weights and biases of the machine as low as possible b)To get the cost function as low as possible in as little amount of time c)To get the backpropagation of the machine to slow down to an extent that it can handle d)To get the large language model (LLM) to respond to certain functions correctly
B.
Which of the following is NOT true about the way neural networks "learn?" a)They perform better with structured data b)They use gradient descent to calculate the most efficient methods of lowering the cost function c)They learn to recognize edges and shapes on their own, based on the activation of each neuron d)Shifting around the weights and biases of the neurons might produce different answers from the computer
C.
What do the values represented by the cost function say about the network?
The smaller the value is, the more accurate the network is in classifying the image to a number. The larger the value is, however, the less accurate it is and the more the network still has to fix.
How does finding minimums with slopes work?
Start at any point on the graph and determine the slope. If the slope is positive, you would need to shift your point to the left. Conversely, if the slope is negative, you would shift your point to the right. You would keep doing this until you get to the point where the slope is very close to 0 and that final point is the approximate location of the minimum. However, in the case where there are multiple minimums, depending on where you choose to start, you would get the location of a local minimum instead of the global minimum.
What are the steps you need to take to minimize a function with multiple inputs and one output?
(1) Calculate the gradient which is the direction of the steepest increase. (2) Step in the direction of the negative of the gradient which is the steepest decrease. (3) Repeat.
Why does the cost function help trend the neural network to the local minimum rather than the global minimum?
The neural network's cost function can help find the slope of where it is on the graph of the cost function and can use that to help it reach a local minimum. Finding a global minimum is much harder than a local minimum, especially since there are over 13000 inputs for the cost function and the neural network does not have to be 100% accurate and is acceptable at a slightly lower accuracy percentage, which can be found at a local minimum.
What is gradient descent and how does it relate to why the neural network uses values between 0 and 1 rather than just being 0 or 1?
Gradient descent is the slight alteration of the input of a function using the negative of the gradient of the function. The slight alteration from the cost function is important to the necessity of the variety of values a neuron can have since the cost function will most accurately calculate the descent if it is given a range of values rather than a neuron being only true or false.
Why does the neural network in the video give a confident answer for a number even though the input to the function was random pixels at random brightnesses?
The neural network does this for multiple reasons. Firstly, the neural network's neurons do not pick up on patterns, rather they look at seemingly random pixels on the screen and slowly piece together what number it might be based on those pixels. Secondly, the neural network's cost function makes it more rewarding for the network to be confidently wrong then uncertainty right, as only two numbers would be wrong maximum as opposed to all of them.
What is the purpose of a cost function in a neural network? a)To initialize random weights and biases b)To measure how poorly the network's output matches the expected result c)To classify the input image d)To compute the gradient of the network's performance
B
what happens when the slope of the cost function flattens as gradient descent approaches a minimum? a)The network stops adjusting the weights b)The step sizes increase c)The step sizes become smaller, preventing overshooting d)The cost function resets to a random value
C
How does the neural network adjust its weights and biases during training?
The neural network adjusts its weights and biases using the gradient descent algorithm. This is done by calculating the slope of the cost function and then adjusting the weights and biases in the direction that minimizes the cost.
What happens if the neural network is fed random noise rather than an actual digit?
The network was taught to confidently categorize images even if they don't follow identifiable patterns, so even though the input is meaningless, it confidently labels random noise as a digit.
Why are the initial weights and biases of the neural network determined at random?
The weights and biases are initially set randomly to illustrate it having no prior knowledge, allowing the training process to optimize its performance through adjustments in its weights and biases.
What role did the MNIST dataset play in the training of the neural network for recognizing handwritten digits?
It provided thousands of labeled handwritten digits so the neural network could train and adjust its weights and biases as necessary.
Why does it not matter that the values in the cost function may "land" in a local minimum instead of a global minimum?
The minima of the cost function happen to be very similar in quality, even if they are not exactly the same as the global minimum. So, in effect, landing in a local minimum will still get you a rather definitive overall output as the different costs/errors are not that different.
How can you improve the process of finding a local minimum?
If you adjust step sizes to be proportional to the slope of the curve as you descend, you lower the chances of missing a local minimum. This is because the steps decrease in size as the curve flattens out and you approach a local minimum.
What can you tell from the negative gradient vector?
The signs of the components tell you whether the components of the input vector should increase or decrease. And the magnitudes of each of these components tell you by how much each of the components in the input vector should change by.
What is a cost function, and what does it do?
A cost function is a way for a machine to know whether the steps it is taking are 'good' or 'bad.' For each input, a cost function squares the absolute value of the predicted output minus the actual output. The cost function then adds these squared values together as a was to evaluate the machine's success.
Why do artificial neurons have continuously ranging activations, as opposed to acting in a binary way like the neurons in our brains do?
If artificial neurons behaved the way biological neurons did, it would be much more difficult, if not impossible, to find the local minima of functions, as artificial neurons rely on taking small steps towards n-dimensional valleys to find weights that optimize the neural network.
Why are many neural networks so confident in wrong predictions for inputs with unclear outputs?
Neural networks only know what they are trained on, so a neural network that is never trained on data with an unclear or nonexistent output would not know how to interpret that data. Additionally, although neural networks can perform the classifications that humans can, they do not necessarily view images or data in the same way we do, meaning that random noise may look like discernible data.
Why are neural networks so confident?
Because the optimization of its cost function encourages confidence.
What could the MNIST dataset have done to discourage blind confidence?
It could've included an NaN category, so that it could be confident in its lack of confidence.
How can we better ensure the speed of finding a local minimum?
This can be helped with a well tuned learning rate and a well structured dataset.
Is it guaranteed that the minimum that you land in to be in the smallest minimum?
No.
What does the "Gradient" of a function tell you?
The direction of the steepest ascent, or when negative the direction of the steepest descent.
Do all adjustments to weights matter the same amount?
No, some adjustments to weights matter more than others.
What is the cost function?
a function that matches how the output does compared to the expected output
Does a neural network learn to recognize shapes, patterns and edges on their own?
no