KD

P3: Neural Network Architecture and Hidden Layers

Universal Approximation Theorem

  • Overview: The Universal Approximation Theorem states that a feedforward neural network with a single hidden layer can approximate any continuous function with enough neurons in the hidden layer.

Function of One Hidden Layer

  • Structural Composition:
    • Neural Network consists of an input layer, one or more hidden layers, and an output layer.
    • Example Structure:
    • Input Layer: Features from the dataset.
    • Hidden Layer 1: 36 neurons with sigmoid activation function.
    • Output Layer: 36 neurons using softmax activation (for classification).

Practical Implementation

  • Code Example:

    • TensorFlow code is used to define the model with:
    • Fully connected layer of 36 neurons.
    • Input layer using the MNIST dataset (handwritten digits).
  • Neuron Configuration:

    • Hidden Layer: 36 neurons arranged in a structure such as 25x25, which corresponds to the input image size.

Activation of Neurons

  • Activation Patterns:
    • When input is applied, each neuron in the hidden layer activates based on the input features.
    • Activation patterns change based on the number of neurons and their arrangement in the network.

Observations on Multiple Hidden Layers

  • Single vs Multiple Hidden Layers:
    • One Hidden Layer: Compressed representation of patterns (e.g., number '5') leads to difficulty in interpreting how the model learns specific features.
    • Two Hidden Layers: Representation is distributed across layers, showing a clearer structure.
    • Configuration: (6x6 in first hidden layer, 5x5 in second hidden layer).
    • Improved interpretation of the model’s learning progression, although complexity remains high.
    • Three and Four Hidden Layers:
    • Further distribution of representation across layers (e.g., 6x6, 5x5, and 4x4 respectively).
    • Reduces the internal complexity and increases interpretability of how the network processes and learns to represent inputs.

Conclusion

  • Layer Complexity:
    • Increasing the number of hidden layers provides more distribution of the representation, resulting in better understanding and learning patterns of the data.
    • The complexity within layers reduces as more layers provide greater abstraction capabilities of feature representations.