Overview: The Universal Approximation Theorem states that a feedforward neural network with a single hidden layer can approximate any continuous function with enough neurons in the hidden layer.
Function of One Hidden Layer
Structural Composition:
Neural Network consists of an input layer, one or more hidden layers, and an output layer.
Example Structure:
Input Layer: Features from the dataset.
Hidden Layer 1: 36 neurons with sigmoid activation function.
Output Layer: 36 neurons using softmax activation (for classification).
Practical Implementation
Code Example:
TensorFlow code is used to define the model with:
Fully connected layer of 36 neurons.
Input layer using the MNIST dataset (handwritten digits).
Neuron Configuration:
Hidden Layer: 36 neurons arranged in a structure such as 25x25, which corresponds to the input image size.
Activation of Neurons
Activation Patterns:
When input is applied, each neuron in the hidden layer activates based on the input features.
Activation patterns change based on the number of neurons and their arrangement in the network.
Observations on Multiple Hidden Layers
Single vs Multiple Hidden Layers:
One Hidden Layer: Compressed representation of patterns (e.g., number '5') leads to difficulty in interpreting how the model learns specific features.
Two Hidden Layers: Representation is distributed across layers, showing a clearer structure.
Configuration: (6x6 in first hidden layer, 5x5 in second hidden layer).
Improved interpretation of the model’s learning progression, although complexity remains high.
Three and Four Hidden Layers:
Further distribution of representation across layers (e.g., 6x6, 5x5, and 4x4 respectively).
Reduces the internal complexity and increases interpretability of how the network processes and learns to represent inputs.
Conclusion
Layer Complexity:
Increasing the number of hidden layers provides more distribution of the representation, resulting in better understanding and learning patterns of the data.
The complexity within layers reduces as more layers provide greater abstraction capabilities of feature representations.