Statistical Machine Learning Algorithm Design
Statistical Machine Learning Algorithm Design Framework (Recipe Box 1.1)
The framework is a useful guideline, not a strict set of rules.
Recipe boxes simplify complex ideas, making it easier to understand subtle and complicated issues later.
Step 1: Modeling the Environment
Objective: Define the mathematical model of the machine learning algorithm's operating environment.
Insights gained: This step helps determine the learning paradigm (supervised, unsupervised, reinforcement learning) and the structure of data.
Learning Paradigms: Identifying patterns without explicit instructions suggests unsupervised learning.
Pattern Structure: How data is represented, including feature vectors and feature maps (transforming real-world data into feature vectors).
Stationary Statistical Environment: Determining if the probability of observing a feature vector remains constant across different learning trials.
Non-Stationary Environment Example (Stock Market): Statistical irregularities in stock market data from 50 years ago are likely different from today, making the environment non-stationary. Data from consecutive, shorter periods (e.g., 5-10 years) might be more stable. The financial environment is often slowly shifting.
Non-Stationary Environment Example (Robot Learning): A robot learning to recognize images in its environment changes its statistical environment by actively moving and looking at different things. The environment is non-stationary because it changes as a function of the robot's current knowledge and actions.
This initial modeling provides rich information before even considering the specific machine learning algorithm.
Step 2: Specifying the Machine Learning Architecture
Simplification: Keep the architecture as simple as possible initially.
Circles and Arrows Notation:
Units (Circles): Represent computing functions with a real-valued