1/32
Vocabulary flashcards summarizing essential terms and definitions from the lecture notes on statistical learning, causal models, and illustrative examples.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Statistical learning
The field that infers properties of an unknown probability distribution from observed data, typically for prediction.
Causal inference
The study of identifying and quantifying cause-and-effect relationships, often involving multiple distributions produced by interventions.
Probability space
A mathematical model (Ω,𝔽,𝑃) consisting of outcomes, events, and a probability measure for a random experiment.
Independent and identically distributed (i.i.d.)
An assumption that each sample is drawn independently from the same joint distribution.
Regression (conditional expectation)
The function f(x)=E[Y|X=x] giving the expected output value for a given input.
Binary classifier
A function that assigns each input x to the more likely class y∈{−1,+1} under P(Y|X=x).
Joint distribution (P_{X,Y})
The probability law governing the simultaneous behavior of random variables X and Y.
Empirical distribution (P_n)
A discrete distribution that puts equal mass 1/n on each observed data point in a sample.
Inverse problem (statistics)
Estimating properties of an unobserved distribution from data generated by that distribution.
Function class / Hypothesis space
The set of candidate functions from which a learning algorithm selects its predictor.
Capacity (of a function class)
A measure of how rich or complex a hypothesis space is, controlling overfitting potential.
Vapnik–Chervonenkis (VC) dimension
A combinatorial capacity measure indicating the largest set of points that can be shattered by a function class.
Expected risk (true risk)
The population loss R[f]=∫(1/2)|f(x)−y| dP_{X,Y}(x,y) measuring generalization error.
Empirical risk
The average loss on the training sample Remp^n[f]=(1/n)∑{i}(1/2)|f(xi)−yi|.
Empirical Risk Minimization (ERM)
The principle of choosing the hypothesis that minimizes empirical risk over the training data.
Consistency (of a learner)
Property that the risk of the learned functions converges to the minimal achievable risk as n→∞.
Universal consistency
A guarantee that, for every fixed underlying distribution, the algorithm approaches Bayes-optimal risk with enough data.
Slow learning rates
Situations where convergence toward optimal risk can be arbitrarily slow for some problems, even with consistent algorithms.
Regularization
A technique that restricts or penalizes complex hypotheses to control capacity and improve generalization.
Bayesian prior
A probability distribution placed over hypotheses or parameters expressing a priori beliefs before seeing data.
Observational distribution
The joint distribution of variables obtained without intervening in the system.
Intervention
An external action that forces a variable to take specific values, potentially altering the joint distribution.
Structural Causal Model (SCM)
A collection of assignments X:=f(paX, NX) defining each variable as a function of its parents and an independent noise term.
Causal reasoning
Deriving implications (e.g., effects of interventions) from a known causal model.
Causal learning / Structure learning
Inferring aspects of the underlying causal graph or mechanisms from data (observational or interventional).
Reichenbach's common cause principle
If X and Y are dependent, there exists a variable Z that causally influences both and renders them independent when conditioned upon.
Confounder
A variable that causally affects two or more variables, creating spurious associations between them.
Screening-off
The property that conditioning on a confounder Z makes its effects (e.g., X and Y) statistically independent: X ⫫ Y | Z.
Correlation ≠ Causation
The principle that statistical dependence alone does not determine causal direction or presence.
Mechanism (in SCM)
The deterministic function linking a variable to its direct causes and noise term in an SCM.
Additive Noise Model (ANM)
A causal model where a child variable equals a function of its parent plus independent additive noise.
Optical character recognition example
Illustration that identical PX,Y for images and labels can arise from different causal structures, yielding different intervention effects.
Gene perturbation example
Scenario showing that deleting a gene (intervention) affects phenotype only if a causal, not merely correlated, relationship exists.