1/63
These vocabulary flashcards cover the principal concepts, algorithms, and statistical foundations presented across the five units of the Machine Learning lecture notes, providing a targeted review for exam preparation.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Machine Learning
Programming computers to improve performance on a task T with experience E, measured by a performance metric P.
Supervised Learning
Learning a function that maps inputs to outputs from labeled example pairs (x,y); includes classification and regression.
Unsupervised Learning
Discovering patterns or structure in unlabeled data; common tasks are clustering and association.
Reinforcement Learning
Learning to choose actions to maximize cumulative, often discounted, reward through interaction with an environment.
Version Space
The set of all hypotheses in H that are consistent with the observed training examples.
PAC Learning
Probably Approximately Correct framework that bounds the number of samples needed so a learner outputs a near-optimal hypothesis with high probability.
VC Dimension
A measure of the capacity of a hypothesis space; the largest number of points that can be shattered by hypotheses in H.
Decision Tree
A tree-structured model where internal nodes test features and leaves output a prediction.
ID3 Algorithm
Greedy decision-tree inducer that selects splits using highest information gain based on entropy.
CART
Classification And Regression Trees; builds binary trees using Gini or SSE and supports pruning.
Random Forest
Bagging of many decision trees built on bootstrap samples with feature-subsampling to reduce variance.
Bagging
Bootstrap Aggregating; trains learners on different bootstrap samples and averages/votes their outputs.
Boosting
Sequentially builds learners that focus on mistakes of prior learners and combines them into a strong ensemble.
AdaBoost
Adaptive Boosting algorithm that re-weights training data and combines weak classifiers with weighted voting.
Stacking (Stacked Generalization)
Ensemble method that learns a meta-model to combine predictions of several base models.
Linear Regression
Predicts a continuous target as a linear combination of input features plus noise.
Multiple Linear Regression
Linear regression with two or more independent variables.
Logistic Regression
Models the probability of a binary outcome via the logistic (sigmoid) function.
Perceptron
A linear binary classifier that outputs 1 if w·x > 0, else −1; trained with the perceptron learning rule.
Multilayer Perceptron (MLP)
Feed-forward neural network with one or more hidden layers and nonlinear activation functions.
Activation Function
Non-linear function (e.g., sigmoid, tanh, ReLU) applied to neuron inputs to introduce non-linearity.
Support Vector Machine (SVM)
Large-margin classifier that finds the hyperplane maximizing the margin between classes; uses kernels for non-linear data.
Kernel Function
Computes inner products in high-dimensional feature spaces to enable kernelized algorithms.
k-Nearest Neighbors (k-NN)
Instance-based learner that classifies a query by majority vote (or averages) of its k closest training examples.
K-means Clustering
Partitions data into k clusters by iteratively assigning points to closest centroid then updating centroids.
K-Modes Clustering
Extension of k-means for categorical data using modes and dissimilarity measures.
Ensemble Learning
Combining multiple models to obtain better predictive performance than any constituent.
Information Gain
Reduction in entropy achieved by partitioning the data on an attribute.
Entropy (Shannon)
Measure of impurity/uncertainty in a data set; H(S)=−Σp log₂ p.
Gini Index
Impurity measure used by CART; Σ p(1−p) over classes.
Bias–Variance Trade-off
Decomposition of generalization error into bias (under-fit) and variance (over-fit) components.
Sample Error
Fraction of training instances misclassified by a hypothesis.
True Error
Probability that a hypothesis misclassifies a random instance drawn from the distribution.
Confidence Interval
Range that, with specified probability, contains the true parameter (e.g., error rate).
Bootstrap Sample
Sample of size n drawn with replacement from an original dataset of n instances.
Gaussian Mixture Model (GMM)
Probabilistic model assuming data are generated from a mixture of several Gaussian distributions.
Expectation-Maximization (EM)
Iterative algorithm with E-step (compute expectations) and M-step (maximize) to learn latent-variable models like GMMs.
Epanechnikov Kernel
Quadratic kernel function used for kernel smoothing: k(u)=¾(1−u²) for |u|≤1.
KD-Tree
Space-partitioning data structure that accelerates nearest-neighbor search in k-dimensional space.
Q-Learning
Model-free reinforcement learning algorithm that learns state-action value function Q(s,a).
Temporal-Difference Learning
Updates value estimates based on difference between successive predictions, bridging Monte-Carlo and dynamic programming.
Genetic Algorithm (GA)
Evolutionary search method that evolves a population of candidate solutions using selection, crossover and mutation.
Chromosome (in GA)
Encoded representation (often a bit string) of a candidate solution in a genetic algorithm.
Fitness Function
Numerical measure that evaluates how well a candidate solution solves the problem.
Crossover Operator
GA operation that constructs offspring by exchanging substrings between two parent chromosomes.
Mutation Operator
GA operation that randomly alters genes (bits) in a chromosome to maintain diversity.
Genetic Programming (GP)
Evolutionary technique that evolves computer programs, often represented as syntax trees.
Baldwin Effect
Hypothesis that individual learning can indirectly speed evolution by smoothing the fitness landscape.
Lamarckian Evolution (computational)
Evolutionary model where learned traits are explicitly written back into an individual’s genotype.
Distance Measures
Quantitative metrics (e.g., Euclidean, Manhattan) that define similarity between instances.
Manhattan (City-Block) Distance
Sum of absolute differences across dimensions; L₁ norm.
Euclidean Distance
Square-root of summed squared differences across dimensions; L₂ norm.
Kernel Smoother
Non-parametric regression that averages nearby observations weighted by a kernel function.
Learning Rate (α)
Step-size parameter controlling the magnitude of weight updates in many learning algorithms.
Exploration vs. Exploitation
RL dilemma: choosing between trying new actions to gather information or using known rewarding actions.
Discount Factor (γ)
Value in [0,1) that reduces future rewards in reinforcement learning’s cumulative return.
Markov Decision Process (MDP)
Framework for sequential decision making defined by states, actions, transition function and rewards.
Error-Correcting Output Codes
Technique that decomposes multi-class problems into multiple binary classifiers using a code matrix.
Fitness Proportionate Selection
GA selection strategy where the probability of choosing an individual is proportional to its fitness.
Tournament Selection
GA selection method that picks the better of randomly chosen individuals with a preset probability.
Schema (GA)
Template describing a subset of chromosomes with fixed positions and wildcards; used in schema theorem.
Schema Theorem
Statement that short, low-order, above-average fitness schemas receive exponentially increasing trials in GA.
Bias (in Estimators)
Difference between expected estimate and the true parameter value.
Variance (in Estimators)
Expected squared deviation of an estimator from its own mean; measures estimate fluctuation.