1/11
Flashcards covering key concepts from Module M12 on convergence and continuous functions, essential for understanding machine learning algorithms and gradient descent.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Module M12
The lecture module concerned with convergence concepts and continuous functions, serving as a stepping stone for designing and analyzing machine learning algorithms.
Empirical Risk Function
A function that represents the average error or loss over a dataset, which a large class of machine learning algorithms aims to minimize.
Gradient Descent
An iterative optimization algorithm used to minimize a function (like the empirical risk function) by moving in the direction opposite to the gradient of the function.
Convergence to a Point (Vector Space)
A process where a sequence of parameter vectors (e.g., thetat during learning) approaches and gets arbitrarily close to a single, specific limiting parameter vector (thetastar) as time (t) approaches infinity.
Convergence to a Set
A process where a sequence of parameter vectors approaches a predefined set of points (e.g., a set of global minimizers), not necessarily a single point.
Bounded Sequence
A sequence of vectors or numbers where the magnitude (absolute value) of every element in the sequence is less than or equal to some finite constant.
Continuous Function (Intuitive Definition)
A function whose graph can be drawn without lifting the pen from the paper, implying no breaks, jumps, or abrupt changes in value.
Rules for Identifying Continuous Functions
A set of guidelines stating that polynomials, exponentials, and logarithms (on positive reals) are continuous, and that weighted sums, products, compositions, and divisions (with non-zero denominator) of continuous functions are also continuous.
McCulloch Pitts (Logical Threshold Unit)
A historical model of a neuron with a discontinuous step-like transfer function (e.g., 1 if input > threshold, 0 otherwise) that made it difficult to apply gradient-based learning algorithms.
Logistic Sigmoid Function
A continuous and differentiable sigmoidal activation function (1/(1+e^(-x))) commonly used in neural networks as a smooth approximation to the McCulloch Pitts threshold function, enabling gradient descent.
Hyperbolic Tangent Sigmoid Function
A continuous and differentiable sigmoidal activation function (tanh(x)) that typically ranges from -1 to +1, also used in neural networks to facilitate gradient-based learning.
Importance of Continuous Functions in ML
Crucial for gradient descent algorithms because they ensure that derivatives can be computed (no 'corners' in the function) and that small changes in parameters lead to predictably small changes in the error function, preventing catastrophic shifts.