P2: Notes on Regression Techniques and Convolutional Neural Networks
Regression Techniques in Machine Learning
- Concept of Regularization:
- Regularization is used to penalize larger weights to prevent overfitting in machine learning models.
- Two common techniques are Ridge regression (L2 regularization) and Lasso regression (L1 regularization).
- Understanding Weights:
- Given values: $w = [23, 5, 4, 3, 2, 68]$ indicates the presence of larger weights.
- The aim is to squeeze larger weights towards smaller values.
- Effect of Ridge Regression:
- In Ridge regression, the larger weights are penalized by applying the polynomial form, thus decreasing their effect in the model.
- As larger $w$ values decrease, the probability of weights becoming zero increases.
- Effect of Lasso Regression:
- In Lasso regression, the focus is on trying to minimize the absolute value of weights.
- This leads to some weights potentially being shrunk to zero, effectively performing feature selection by eliminating less important features.
Convolutional Operations in Neural Networks
- Convolution Concept:
- Convolution in Neural Networks focuses on specific areas of the input rather than the complete input, allowing for feature extraction.
- Difference between Convolution and Fully Connected Networks:
- Fully connected networks connect each neuron in one layer to all neurons in the next layer, providing a holistic overview.
- Convolutional layers focus only on localized patches of the input, allowing neurons to specialize in specific features of the image.
- Why Convolution?
- Specializing in features allows better extraction of necessary attributes in data (like images).
- Each neuron is responsible for a distinct patch of features instead of absorbing the entire dataset.
Neural Network Layers and Features Extraction
- Creating Feature Maps:
- Convolutional layers generate multiple feature maps based on different weights (filters) which detect unique features in the input data.
- Filters act like lenses focusing on particular aspects of an input, generating different neuron activations based on the learned weights.
- Pooling Operations:
- Following convolutions, pooling operations downsample the feature maps to provide a hierarchy of features, aiding in reducing dimensionality.
- Pooling types include max pooling and average pooling, focusing on key features while ignoring insignificant variations (noise).
Visualization and Calculation Example
- Example of Convolution:
- When a specific filter is convolved over an image, a particular activation value is derived, indicating the presence of corresponding features.
- Example calculation:
- For an image patch $I = [1, 0, 0, 0, 1, 0, 0, 0, 1]$ with a filter $F = [0, 1, 1, 0, 0]$, the activation value is computed as the sum of the products of the overlapping values.
- Pooling operation illustration:
- Suppose pooling takes the maximum value from regions of the feature map to ensure that the critical features remain while discarding noise.
- Example for max pooling would yield maximum values from distinct regions of the input feature maps, summarizing important features.