1/44
Flashcards for Statistical Modeling lecture.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Linear Regression Models
A model based on the assumption of normally distributed residuals, useful in data mining.
Maximum Likelihood Estimation
Training models by estimating parameters that maximize the probability of observed data.
Bayesian Estimation
Training models within a probabilistic framework, incorporating prior beliefs.
Random Variable
Represents an object that can take one among a set of possible values.
Domain of x (dom(x))
The set of possible values a random variable can take.
Probability Distribution
For a discrete random variable, it specifies the probability that each value in dom(x) occurs.
Pr(x = vi)
Probability that a single value vi ∈ dom(x) occurs.
Cumulative Distribution Function
Probability Pr(x ≤ vi) for each vi ∈ dom(x).
Joint Distribution
Combination of all component distributions for a set of random variables.
Independent Variables
Random variables with no relationship between them.
Dependent Variables
Random variables that are related to each other.
Variable Independence
Two random variables x1 and x2 where Pr(x1,x2) = Pr(x1)·Pr(x2).
Pr(x1,x2)
Probability of x1 and x2.
Pr(x1|x2)
Probability of x1 given x2.
Conditional Independence
Two variables x1 and x2 are conditionally independent given x3 if Pr(x1,x2|x3) = Pr(x1|x3)·Pr(x2|x3).
Sample from a Population
A data set is a subset from a larger group.
Representative Sample
Accurately reflects characteristics of the population.
Likelihood Pr(D|h)
Probability of observing data D, given some world where the hypothesis h is true.
Posterior Pr(h|D)
Probability that hypothesis h is true, given that we have observed data set D.
Random Trial
A procedure with a set of well-defined outcomes.
Binomial Distribution
Distribution of random variable that counts number of successes in a sequence of n independent trials.
Maximum Likelihood Estimation (MLE)
Compute the parameters of h so that the likelihood L(h|D) is maximized, i.e. argmaxh{L(h|D)}.
Bayesian Estimation
Compute a model h of maximum posterior probability Pr(h|D), i.e. argmaxh{Pr(h|D)}.
Prior Pr(h)
Initial probability that hypothesis h holds without having observed any data.
Naïve Bayes Classification
Simplified classification technique using Bayes Rule and assuming conditional independence of attributes.
Pr(v)
Probability of attribute value v.
Pr(v|u)
Probability of attribute value v given the value u of another attribute.
Bayes Rule of Conditional Probability
h: hypothesis that something will occur, d: observed evidence; Pr(h|d) = Pr(d|h)Pr(h) / Pr(d).
Laplace Estimation
Add 1 to the numerator and ℓ to the denominator, where ℓ is the number of attribute values, in likelihood computations.
Probability Density Function
A continuous random variable is associated with a function which allows computing the probability that a ≤ x ≤ b, for some values a,b ∈ dom(x).
Density Estimation
Given a data set, compute an estimate of an underlying probability density function.
Parametric Density Estimation
Select a common distribution (e.g. gaussian) and estimate the parameters from the data sample.
Non-parametric Density Estimation
Fit a model to an arbitrary data distribution (e.g. kernel density estimation).
Mahalanobis Distance
A generalization of Euclidean distance by accounting for correlations.
Mean
Statistical measure in the sense that it minimises: f (y) = m∑ i=1 (y −xi,j)2 .
Variance
The average deviation between the data values and the mean.
Standard Deviation
Measures the spread around the mean of the data.
Covariance
Statistical measure of the relationship between two variables.
Heterogeneous Data Sets
Captures different underlying phenomena of data or represents multiple subpopulations.
Mixture Distribution
Probability distribution derived as a collection of simpler distributions.
Poisson Distribution
Models the number of events occurring per time.
Mixture Model
Consists of multiple component models each one specified by its own parameters: f (x) = K ∑ k=1 πk fk (x;wk ).
Mixture Models
Used for clustering by associating a component parametric model to each cluster.
Gaussian Mixture Model (GMM)
A mixture model with Gaussian components.
Expectation-Maximization (EM) Algorithm
Algorithm for computing GMMs that repeats Expectation and Maximization steps until convergence.