1/34
Flashcards covering key concepts from the lecture on the Statistical Machine Learning Algorithm Design Framework, including modeling, architecture, loss functions, learning algorithms, evaluation, and properties of functions and basis units.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Statistical Machine Learning Algorithm Design Framework
A useful guideline or recipe box for designing machine learning algorithms, consisting of several steps from modeling the environment to evaluation.
Recipe Box 1.1
A foundational, simplified set of basic principles or steps within the framework, intended to help understand more complex issues later.
Modeling the Environment (Step 1)
The first step in algorithm design; building a mathematical model of everything around the machine learning algorithm to gain insights into its type (e.g., supervised, unsupervised, reinforcement learning) and data structure.
Feature Maps
Transformations that take real-world data and convert it into feature vectors for machine learning algorithms.
Stationary Statistical Environment
An environment where the probability of seeing a feature vector or a specific pattern (e.g., feature vector with a person's name in supervised learning) always remains the same.
Nonstationary Statistical Environment
An environment where the statistical irregularities or probabilities of data patterns change over time, such as stock market predictions over decades or a robot's environment as it moves and learns.
Specifying the Machine Learning Architecture (Step 2)
The second step in algorithm design; defining the structure of the learning model, often using units (nodes), activity levels, and parameters (connections/weights).
Units
Little computing functions in a machine learning architecture, represented as circles, each with a real-valued state called its activity level.
Activity Level
The real-valued state or output of a unit (node) in a machine learning architecture.
Parameters / Connections
Typically represented by arrows in an architecture diagram, associated with different values (weights) that the algorithm learns to adjust.
Function Decomposition
A principle in machine learning where a complicated function is broken down into smaller, more manageable functions, a recurrent theme in architecture design.
Start Simple Principle
A design guideline recommending to choose the simplest possible architecture that might work, to gain insights if it fails or easily compare against more complex designs if it succeeds.
Specifying the Loss Function (Step 3)
The third step in algorithm design; defining a mathematical function that quantifies the discrepancy between the algorithm's prediction (y double dot) and the desired response (y), typically prediction error per data record.
Empirical Risk Function
The average loss, calculated by averaging the loss function over all training stimuli or data records.
Goal of Learning
To find a set of parameters (theta hat n) that minimizes the empirical risk function for all possible parameters within a defined parameter space.
Designing a Learning Algorithm (Step 4)
The fourth step in algorithm design; typically using gradient descent or a similar optimization method to find the parameters that minimize the loss function.
Gradient Descent
A common optimization algorithm used in machine learning to iteratively adjust parameters in the direction of the steepest decrease of the loss function.
Designing/Downloading Evaluation Algorithms (Step 5)
The fifth step in algorithm design; creating methods to assess how well the learned parameters and the overall algorithm are performing.
Evaluation of Algorithm Behavior
Assessing how well a machine learning model captures statistical regularities from its data-generating process, recognizing that inductive learning machines are fundamentally data-dependent.
Argmin (theta hat n)
Notation indicating the argument (theta hat n) that minimizes a given function (l hat n theta) over a specified parameter space (theta).
Satisficing
Being content with a 'good enough' or heuristic solution rather than necessarily finding the globally optimal solution, especially in complicated learning problems with multiple good local minimizers.
Continuous Function (Quick Definition)
A function that can be drawn without lifting the pencil from the paper, implying no sudden jumps or breaks.
Discontinuous Function
A function that has breaks or jumps, where one must lift the pencil while drawing it, for instance, a function with a single point having an anomalous value.
Properties of Continuous Functions
Guidelines to determine continuity: polynomials, exponentials, and logarithms (on positive reals) are continuous; weighted sums, products, compositions of continuous functions are continuous; f(x)/g(x) is continuous when g(x) is not zero.
Sigmoidal Function / Logistic Sigmoid
A common S-shaped mathematical function (e.g., 1 / (1 + e^-x)) widely used for hidden units in neural networks, known for being continuous and differentiable.
Rectified Linear Hidden Units (ReLU)
A type of hidden unit that outputs the input directly if it's positive, and zero otherwise (max(0, x)).
Rectified Linear Unit (ReLU) Properties
A function that is continuous but not differentiable at the point where the input is zero, which can pose problems for gradient-based learning if not appropriately handled.
Step Function
A discontinuous function that outputs one value (e.g., 1) if the input is above a threshold and another value (e.g., 0) if it's below or equal to the threshold.
Function Approximation
A mathematical concept where a complex or arbitrary function is represented as a weighted sum of simpler, basic functions (e.g., Fourier analysis with frequencies, eigenvector analysis, neural network hidden units).
McCulloch Pitts Formal Neuron / Logical Threshold Unit (LTU)
An early model of a neuron, also called a logical threshold unit, that can implement arbitrary logic gates (AND, OR, NOT) by adjusting its weights.
Bias (Machine Learning) / Intercept (Statistics)
An additional parameter in a model (often represented as 'b' or part of a weight vector) that shifts the activation function, allowing better fitting of the data.
Soft Plus
A smooth approximation of the rectified linear unit (ReLU), defined as log(1 + e^x), which is differentiable everywhere and offers numerical stability advantages.
Radial Basis Function (RBF)
A type of basis function often visualized as 'Gaussian bumps,' used in neural networks to approximate complex nonlinear functions by taking weighted sums of these localized bumps.
Sigmoidal Hidden Unit
A hidden unit where its output is a sigmoidal function of the weighted sum of its inputs, often used to approximate logical functions.
Radial Basis Function Hidden Unit
A hidden unit whose output is based on a radial basis function (e.g., e^(-||s - vj||^2)), which 'peaks out' when the input 's' is close to its center 'vj'.