N

Statistical Machine Learning Algorithm Design

Statistical Machine Learning Algorithm Design Framework (Recipe Box 1.1)

  • The framework is a useful guideline, not a strict set of rules.

  • Recipe boxes simplify complex ideas, making it easier to understand subtle and complicated issues later.

Step 1: Modeling the Environment
  • Objective: Define the mathematical model of the machine learning algorithm's operating environment.

  • Insights gained: This step helps determine the learning paradigm (supervised, unsupervised, reinforcement learning) and the structure of data.

    • Learning Paradigms: Identifying patterns without explicit instructions suggests unsupervised learning.

    • Pattern Structure: How data is represented, including feature vectors and feature maps (transforming real-world data into feature vectors).

    • Stationary Statistical Environment: Determining if the probability of observing a feature vector remains constant across different learning trials.

      • Non-Stationary Environment Example (Stock Market): Statistical irregularities in stock market data from 50 years ago are likely different from today, making the environment non-stationary. Data from consecutive, shorter periods (e.g., 5-10 years) might be more stable. The financial environment is often slowly shifting.

      • Non-Stationary Environment Example (Robot Learning): A robot learning to recognize images in its environment changes its statistical environment by actively moving and looking at different things. The environment is non-stationary because it changes as a function of the robot's current knowledge and actions.

  • This initial modeling provides rich information before even considering the specific machine learning algorithm.

Step 2: Specifying the Machine Learning Architecture
  • Simplification: Keep the architecture as simple as possible initially.

  • Circles and Arrows Notation:

    • Units (Circles): Represent computing functions with a real-valued