P2: Notes on Regression Techniques and Convolutional Neural Networks
Regression Techniques in Machine Learning
Concept of Regularization:
Regularization is used to penalize larger weights to prevent overfitting in machine learning models.
Two common techniques are Ridge regression (L2 regularization) and Lasso regression (L1 regularization).
Understanding Weights:
Given values: $w = [23, 5, 4, 3, 2, 68]$ indicates the presence of larger weights.
The aim is to squeeze larger weights towards smaller values.
Effect of Ridge Regression:
In Ridge regression, the larger weights are penalized by applying the polynomial form, thus decreasing their effect in the model.
As larger $w$ values decrease, the probability of weights becoming zero increases.
Effect of Lasso Regression:
In Lasso regression, the focus is on trying to minimize the absolute value of weights.
This leads to some weights potentially being shrunk to zero, effectively performing feature selection by eliminating less important features.
Convolutional Operations in Neural Networks
Convolution Concept:
Convolution in Neural Networks focuses on specific areas of the input rather than the complete input, allowing for feature extraction.
Difference between Convolution and Fully Connected Networks:
Fully connected networks connect each neuron in one layer to all neurons in the next layer, providing a holistic overview.
Convolutional layers focus only on localized patches of the input, allowing neurons to specialize in specific features of the image.
Why Convolution?
Specializing in features allows better extraction of necessary attributes in data (like images).
Each neuron is responsible for a distinct patch of features instead of absorbing the entire dataset.
Neural Network Layers and Features Extraction
Creating Feature Maps:
Convolutional layers generate multiple feature maps based on different weights (filters) which detect unique features in the input data.
Filters act like lenses focusing on particular aspects of an input, generating different neuron activations based on the learned weights.
Pooling Operations:
Following convolutions, pooling operations downsample the feature maps to provide a hierarchy of features, aiding in reducing dimensionality.
Pooling types include max pooling and average pooling, focusing on key features while ignoring insignificant variations (noise).
Visualization and Calculation Example
Example of Convolution:
When a specific filter is convolved over an image, a particular activation value is derived, indicating the presence of corresponding features.
Example calculation:
For an image patch $I = [1, 0, 0, 0, 1, 0, 0, 0, 1]$ with a filter $F = [0, 1, 1, 0, 0]$, the activation value is computed as the sum of the products of the overlapping values.
Pooling operation illustration:
Suppose pooling takes the maximum value from regions of the feature map to ensure that the critical features remain while discarding noise.
Example for max pooling would yield maximum values from distinct regions of the input feature maps, summarizing important features.