P3: Neural Network Architecture and Hidden Layers
Universal Approximation Theorem
- Overview: The Universal Approximation Theorem states that a feedforward neural network with a single hidden layer can approximate any continuous function with enough neurons in the hidden layer.
Function of One Hidden Layer
- Structural Composition:
- Neural Network consists of an input layer, one or more hidden layers, and an output layer.
- Example Structure:
- Input Layer: Features from the dataset.
- Hidden Layer 1: 36 neurons with sigmoid activation function.
- Output Layer: 36 neurons using softmax activation (for classification).
Practical Implementation
Code Example:
- TensorFlow code is used to define the model with:
- Fully connected layer of 36 neurons.
- Input layer using the MNIST dataset (handwritten digits).
Neuron Configuration:
- Hidden Layer: 36 neurons arranged in a structure such as 25x25, which corresponds to the input image size.
Activation of Neurons
- Activation Patterns:
- When input is applied, each neuron in the hidden layer activates based on the input features.
- Activation patterns change based on the number of neurons and their arrangement in the network.
Observations on Multiple Hidden Layers
- Single vs Multiple Hidden Layers:
- One Hidden Layer: Compressed representation of patterns (e.g., number '5') leads to difficulty in interpreting how the model learns specific features.
- Two Hidden Layers: Representation is distributed across layers, showing a clearer structure.
- Configuration: (6x6 in first hidden layer, 5x5 in second hidden layer).
- Improved interpretation of the model’s learning progression, although complexity remains high.
- Three and Four Hidden Layers:
- Further distribution of representation across layers (e.g., 6x6, 5x5, and 4x4 respectively).
- Reduces the internal complexity and increases interpretability of how the network processes and learns to represent inputs.
Conclusion
- Layer Complexity:
- Increasing the number of hidden layers provides more distribution of the representation, resulting in better understanding and learning patterns of the data.
- The complexity within layers reduces as more layers provide greater abstraction capabilities of feature representations.