Statistical Modeling Vocabulary

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/44

Earn XP

Description and Tags

Flashcards for Statistical Modeling lecture.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

45 Terms

New cards

Linear Regression Models

A model based on the assumption of normally distributed residuals, useful in data mining.

New cards

Maximum Likelihood Estimation

Training models by estimating parameters that maximize the probability of observed data.

New cards

Bayesian Estimation

Training models within a probabilistic framework, incorporating prior beliefs.

New cards

Random Variable

Represents an object that can take one among a set of possible values.

New cards

Domain of x (dom(x))

The set of possible values a random variable can take.

New cards

Probability Distribution

For a discrete random variable, it specifies the probability that each value in dom(x) occurs.

New cards

Pr(x = vi)

Probability that a single value vi ∈ dom(x) occurs.

New cards

Cumulative Distribution Function

Probability Pr(x ≤ vi) for each vi ∈ dom(x).

New cards

Joint Distribution

Combination of all component distributions for a set of random variables.

New cards

Independent Variables

Random variables with no relationship between them.

New cards

Dependent Variables

Random variables that are related to each other.

New cards

Variable Independence

Two random variables x1 and x2 where Pr(x1,x2) = Pr(x1)·Pr(x2).

New cards

Pr(x1,x2)

Probability of x1 and x2.

New cards

Pr(x1|x2)

Probability of x1 given x2.

New cards

Conditional Independence

Two variables x1 and x2 are conditionally independent given x3 if Pr(x1,x2|x3) = Pr(x1|x3)·Pr(x2|x3).

New cards

Sample from a Population

A data set is a subset from a larger group.

New cards

Representative Sample

Accurately reflects characteristics of the population.

New cards

Likelihood Pr(D|h)

Probability of observing data D, given some world where the hypothesis h is true.

New cards

Posterior Pr(h|D)

Probability that hypothesis h is true, given that we have observed data set D.

New cards

Random Trial

A procedure with a set of well-defined outcomes.

New cards

Binomial Distribution

Distribution of random variable that counts number of successes in a sequence of n independent trials.

New cards

Maximum Likelihood Estimation (MLE)

Compute the parameters of h so that the likelihood L(h|D) is maximized, i.e. argmaxh{L(h|D)}.

New cards

Bayesian Estimation

Compute a model h of maximum posterior probability Pr(h|D), i.e. argmaxh{Pr(h|D)}.

New cards

Prior Pr(h)

Initial probability that hypothesis h holds without having observed any data.

New cards

Naïve Bayes Classification

Simplified classification technique using Bayes Rule and assuming conditional independence of attributes.

New cards

Pr(v)

Probability of attribute value v.

New cards

Pr(v|u)

Probability of attribute value v given the value u of another attribute.

New cards

Bayes Rule of Conditional Probability

h: hypothesis that something will occur, d: observed evidence; Pr(h|d) = Pr(d|h)Pr(h) / Pr(d).

New cards

Laplace Estimation

Add 1 to the numerator and ℓ to the denominator, where ℓ is the number of attribute values, in likelihood computations.

New cards

Probability Density Function

A continuous random variable is associated with a function which allows computing the probability that a ≤ x ≤ b, for some values a,b ∈ dom(x).

New cards

Density Estimation

Given a data set, compute an estimate of an underlying probability density function.

New cards

Parametric Density Estimation

Select a common distribution (e.g. gaussian) and estimate the parameters from the data sample.

New cards

Non-parametric Density Estimation

Fit a model to an arbitrary data distribution (e.g. kernel density estimation).

New cards

Mahalanobis Distance

A generalization of Euclidean distance by accounting for correlations.

New cards

Mean

Statistical measure in the sense that it minimises: f (y) = m∑ i=1 (y −xi,j)2 .

New cards

Variance

The average deviation between the data values and the mean.

New cards

Standard Deviation

Measures the spread around the mean of the data.

New cards

Covariance

Statistical measure of the relationship between two variables.

New cards

Heterogeneous Data Sets

Captures different underlying phenomena of data or represents multiple subpopulations.

New cards

Mixture Distribution

Probability distribution derived as a collection of simpler distributions.

New cards

Poisson Distribution

Models the number of events occurring per time.

New cards

Mixture Model

Consists of multiple component models each one specified by its own parameters: f (x) = K ∑ k=1 πk fk (x;wk ).

New cards

Mixture Models

Used for clustering by associating a component parametric model to each cluster.

New cards

Gaussian Mixture Model (GMM)

A mixture model with Gaussian components.

New cards

Expectation-Maximization (EM) Algorithm

Algorithm for computing GMMs that repeats Expectation and Maximization steps until convergence.