COMP2870 Probability & Statistics

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/74

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No study sessions yet.

75 Terms

New cards

Dataset

A structured form of data points usually organised into a tabular form. Each column represents an attribute, and each row represents a value for each attribute, linked in some way (e.g. same subject).

New cards

Types of Data Attributes

Numerical (Quantitative) - can be measured/counted, e.g. height
- Discrete - Takes a finite/countable number of values, e.g. number of children.
- Continuous - Any value in an interval(s) can (theoretically) be taken, e.g. height).
Categorical (Qualitative) - can’t be measured/counted, e.g. gender
- Ordinal - Has a natural defined order, e.g. rankings
- Nominal - Has no natural order, e.g. colours

New cards

Frequency Table

A table that displays the number of occurrences (frequency) of each category in a dataset.

New cards

Grouping Continuous Data

Split values into intervals (bins), calculate frequency within each bin, depict with a histogram.

New cards

Mean

x̄ = 1/n ∑x_i

For real numbers k and a:

mean(x₁ + k, x₂ + k + …) = mean(x) + k
mean(ax₁ + ax₂ + …) = mean(x) * a

New cards

Median

The middle number (or average of 2 middles) in a dataset.

For real numbers k and a:

median(x₁ + k, x₂ + k + …) = median(x) + k
median(ax₁ + ax₂ + …) = median(x) * a

New cards

Outlier

A value in a dataset which is vastly different to the majority of data.

New cards

Variance

Average of the squared differences from the mean.

Var(x₁, x₂, …, x_n) = 1/n ∑(x_i - x̄)²

Alternatively, expressed as:

Var(x₁, x₂, …, x_n) = mean(x²₁, … x²_n) - x̄², aka ‘the mean of the squares minus the square of the mean)’ - MSMSM.

For real numbers k and a:

var(x₁ + k, x₂ + k + …) = var(x)
var(ax₁ + ax₂ + …) = var(x) * a²

New cards

Standard Deviation

Amount of dispersion or spread in a dataset.

sd(x₁, x₂, …, x_n) = sqrt(Var(x₁, x₂, …, x_n))

For real numbers k and a:

sd(x₁ + k, x₂ + k + …) = sd(x)
sd(ax₁ + ax₂ + …) = sd(x) * |a|

New cards

Upper/Lower Quartiles

Lower Quartile - Median of lower half of a dataset

Upper Quartile - Median of upper half of a dataset

i.e. LQ and UQ are med of n each, and if 2n+1, they skip the middle

New cards

Interquartile Range

Measure of dispersion less sensitive to outliers.

IQR = UQ - LQ, where:

UQ is upper quartile, median of top half

LQ is lower quartile, median of bottom half

New cards

Sample Space (Ω)

The set of all possible outcomes of a random experiment.

New cards

Event

A subset of the sample space, i.e. a set of one or more outcomes from Ω.

Set of all events is denoted by F. If result of experiment is in E, then E ‘occurred’.

New cards

Mutually Exclusive

Events that cannot occur simultaneously.

E_i ∩ E_j = ∅

For each distinct i and j this must be the case for > 2 events to all be mutually exclusive.

New cards

Independent Events

One event doesn’t affect the outcome of another.

P(A ∩ B) = P(A)P(B)

New cards

Discrete Random Variable

A function defined on an experiment’s sample space,

mapping X: Ω → something. If X : Ω → R, X is a random variable.

If X’s output range is a finite or countable set, e.g. integers, then X is a discrete random variable.

If g: R → R, g is also a discrete random variable, mapping outcomes w to g(X(w)), i.e. the probability that X = w.

New cards

P(X = x)

Shorthand form of writing P({w: X(w) = x}), i.e. the probability that the random variable’s outcome is equal to a specific value x.

New cards

Support (of discrete random variable)

The set of values of x for which P(X = x) > 0.

New cards

Probability Mass Function f_x(x)

f_x(x) = P(X == x) for all values x in the support. Creates a ‘bar chart’.

New cards

Cumulative Distribution Function F_x(x)

F_x(x) = P(X ≤ x).

For a discrete random variable X, F_x is a straight line at each integer, and then jumps to the next value. Creates the ‘arrow’ graph.

New cards

Continuous Random Variable

A random variable that can take any value within a range. A variable X: Ω → R with the property that there is a density function f_x, such that:

For all a and b with a ≤ b:

P(a ≤ X ≤ b) = ∫_a^b f_X(u) du

New cards

Density Function

A function f_x(u) that describes the probability distribution of a continuous random variable X. It is used to determine the probability of X falling within a certain range by integrating over that range.

f_x(x) ≥ 0 for all x ∈ R
∫^∞_-∞ f_X(u) du = 1
P(X == x) = 0 for all x ∈ R, i.e. the exact point’s probability is 0 (since integrating from a to a has 0 area).

New cards

Uniform Distribution

A probability distribution where all outcomes are equally likely within a specified range.

For a continuous uniform distribution, this has density function:
f_x(x) = 1/b-a, a ≤ x ≤ b

So for a uniform distribution from [0, 1], this is f_x(x) = 1 for a ≤ X ≤ b.

New cards

Cumulative Distribution Function

A function that describes the probability that a continuous random variable, X, will take the value ≤ a constant.

F_x(x) = P(X ≤ x) = ∫^x_-∞ f_X(u) du

e.g.,

P(a ≤ X ≤ b) = P(X ≤ b) - P(X ≤ a) = F_x(b) - F_x(a)

lim x→∞ Fx(x) = 1
lim x→-∞ Fx(x) = 0
F_x is increasing
f_x = F_x’

New cards

Exponential Distribution

A probability distribution that models the time until an event occurs.

X has exponential distribution, with parameter λ > 0, if density function:

f_x(x) = λe^-λx, x ≥ 0, 0 otherwise

and therefore cumulative distribution function:

F_x(x) = 1 - e^-λx, x ≥ 0, 0 otherwise

X is the time until something happens, e.g. P(X < 1) is the probability a TV fails within 1 year.

New cards

Normal Distribution

A probability distribution that is symmetric around the mean, μ.

A continuous random variable X has the normal distribution, i.e. X ~ N(μ, σ²), where σ² is the variance, if its density function is:

f_x(x) = (1 / (σ√(2π))) * e^(-(x - μ)² / (2σ²))

When μ = 0 and σ = 1, it’s the Standard Normal Distribution.

New cards

Standard Normal Distribution

A form of the Normal Distribution where μ = 0 and σ = 1. It’s denoted by Z, so a continuous random variable X ~ Z(0, 1).

Its cumulative distribution function, F_z(x), is denoted by:

ϕ(x) = P(Z ≤ x)

New cards

Functions of Random Variables Method

Since a function of a random variable, i.e. f: X → R, is also a random variable, this is used to derive distributions of unknown functions.

Write the new function’s cumulative distribution function, i.e. F_x(Y)
Rearrange for the known variable, i.e. F_x(X)
Solve for the density function by differentiating

E.g.

f_X(x) = 2x, F_X(x) = x², Y = f(X) = X²

F_Y(y) = P(Y ≤ y) = P(X² ≤ y) = P(X ≤ √y) = F_x(√y) = y, so F_y(y) = y

f_y(y) = 1 (differentiate)

New cards

X ~ N(μ, σ² ) and Y = aX + b, a =/= 0.

Y ~ N(aμ + b, (aσ)²)

This result allows us to use ϕ (CDF of the Standard Normal Distribution) on regular Normally Distributed random variables, because if we take a = 1/σ and b = -μ/σ, we get:
Y = X/σ -μ/σ = (X - μ)/σ

so Y ~ N(0, 1), so Y ~ Z

e.g.

X ~ N(5, 10)

Let Y = (X - 5)/√10. Now, Y ~ N(0, 1), and we can use the Standard Normal Distribution to find probabilities related to X through Z.

P(X ≤ 2) = P(Y ≤ (2 - 5)/√10) = ϕ(−0.9487)

New cards

Generating random variables by converting from Uniform

If a special distribution isn’t available, we can convert the uniform distribution’s ‘amount of probability’ into a variable for the new distribution.

F is the cumulative distribution function of the unknown variable.

Take a uniform random variable, U, on [0,1]. And let X = F^-1(U). This means X is an inverse function that takes the ‘level’ of uniform probability and maps it to the point along the axis that corresponds to the same probability under F.

P(X ≤ x) = P(F^-1(U) ≤ x) _{^{(because of this)}}

= P(U ≤ F(x)) _{^{(Probability of point being to the left of x is the same as the probability level of U being under the CDF height at x)}}

= F(x) _{^{(because the meaning of a uniform random variable is that P(X <= x) = x, e.g. P(X <= 0.5 = 0.5)}}

So P(X ≤ x) = F(x)

In practice:

Take your cumulative distribution and inverse it, i.e. F^-1(X)
Generate a random number on the uniform distribution
Sub the value you get into the inverse to get your new random variable X under the new distribution

New cards

Jointly Distributed

Random variables defined on the same sample space, with a joint probability distribution defining their behaviour.

New cards

Joint Probability/Bivariate Mass Function

A probability mass function for jointly distributed discrete random variables.

f_x,y(x, y) = P(X = x, Y = y)

New cards

Marginal Mass Function

The probability distribution of a single variable from a jointly-distributed pair of random variables.

e.g. f_x(x) = ∑_y f_x,y(x,y) = P(X = x)

New cards

Joint Probability/Bivariate Density Function

A probability density function for jointly distributed continuous random variables.

A function, f_x,y(x, y) : ℝ x ℝ → ℝ:

P(a ≤ X ≤ b and c ≤ Y ≤ d) = ∫_a^b ∫_c^d f_x,y(x, y) dy dx

f_x,y(x, y) ≥ 0 for all x, y in ℝ
∫_-_∞^∞ ∫_-_∞^∞ f_x,y(x, y) dy dx = 1

New cards

Marginal Density Function

The probability density function of a single variable from a jointly-distributed pair of random variables.

e.g. f_x(x) = ∫_-∞^∞ f_x,y(x, y) dy

New cards

Independent Random Variables

For every pair of A and B:

P(X ∈ A and Y ∈ B) = P(X ∈ A) * P(Y ∈ B)

This means that:

f_x,y(x, y) = f_x(x) * f_y(y)

For jointly-distributed random variables, if X and Y are independent:

If f_Y(y) > 0, f_{X, Y}(x | y) = f_X(x)

New cards

Determining Probability of Joint Density Functions

To find the probability that (X, Y) lies in a region R,

P((X, Y) ∈ R) = ∫∫_R f_x,y(x, y) dy dx, where R defines the region.

The integral of the joint density function represents a volume, i.e. integrating over a region. (X, Y) lying in the region R corresponds to the event occurring.

Example:

If R is defined by the limits a ≤ X ≤ b and c ≤ Y ≤ d, then P((X, Y) ∈ R) = ∫_a^b ∫_c^d f_x,y(x, y) dy dx.

New cards

Conditional Probability

The probability of one event restricted to the sample space of another.

P(A | B) = P(A n B)/P(B)

New cards

Jointly Distributed Conditional Probability

Using the equation for conditional probability:

If the variables are discrete:

P(X = x | Y = y) = f_x,y(x, y) / f_y(y)

f_X|Y(x, y) ≥ 0 for all x
∑_xf_X|_Y(x, y) = 1 (since the sample space is now Y)

If the variables are continuous:

f_X|Y(x|y) = f_X,Y(x, y) / f_Y(y)

f_X|Y(x | y) ≥ 0 for all x
∫^∞_-_∞f_X|_Y(x | y) = 1

New cards

Bayes Theorem

P(A | B) = P(B|A)P(A) / P(B)

New cards

Expectation of Discrete Random Variable

E(X) = ∑x * f_X(x) = ∑x * P(X==x)

New cards

Expectation of Continuous Random Variable

E(X) = ∫^∞_-∞ u f_X(u) du

New cards

Law of the Unconscious Statistician

If X is a random variable and g is a function and Y = g(X):

If discrete:

E(Y) = ∑g(x) * f_X(x) = ∑g(x) * P(X==x)

If continuous:
E(Y) = ∫^∞_-∞ g(u) f_X(u) du

For multiple variables (jointly distributed):

If W = g(X, Y), E(W) = ∫∫^∞_-∞ g(x, y) f_X,Y(x, y) dx, dy

New cards

Properties of Expectation of Random Variables

E(a₀ + a₁X₁ + … + a_nX_n) = a₀ + a₁E(X₁) + … + a_nE(X₂)

i.e. E(X + Y) = E(X) + E(Y)

If X and Y are jointly-distributed and independent,

E(XY) = E(X)E(Y)

New cards

Variance of Random Variables

Var(X) = E(X - E(X))² = E(X²) - E(X)² , ESMSE:

The expected of the squared minus the squared of the expected.

For jointly-distributed, independent random variables,

Var(a₀ + a₁X + …) = a₁²Var(X) + …

New cards

Distribution of Sums of Normally-distributed independent variables

If X and Y are both independent and normally distributed, i.e.:

X ~ N(μ_1, σ₁²) and Y ~ N(μ₂, σ₂²)

aX + bY ~ N(aμ1 + bμ2, a²σ1² + b²σ2²)

i.e. the variables can be added and are normally distributed as well

New cards

Stochastically Dominates

If X and Y are random variables defined on the same sample space, if for every outcome ω, X(ω) ≥ Y(ω), then X stochastically dominates Y.

E.g. X represents the number of heads in 3 coin tosses, while Y represents the number of heads only in the final toss.

Then E(X) ≥ E(Y).

New cards

Markhov’s Inequality

If X is a random variable and a > 0, then:

P(|X| ≥ a) ≤ E(|X|)/a

Markov’s inequality tells us that if the expectation of |X| is not large, then the probability that |X| is large is small.

New cards

Chebyshev’s Inequality

If X is a random variable and a > 0, then:

P(|X - E(X)| ≥ a) = Var(X)/a²

Chebyshev’s Inequality gives an upper bound on the probability that a variable is far away from its mean.

If we took a = kσ, then this becomes:

P(|X - E(X) ≥ kσ) = 1/k².

New cards

The Weak Law of Large Numbers (Theorem 4.4.1)

If X₁, X₂ … are independent random variables with the same distribution, and E(X_i) = μ and Var(X_i) = σ². For every ε > 0:

P(|S_n/n - μ| ≥ ε) ≤ σ²/(nε²), and as n → infinity, this P → 0

P(|S_n/n - μ| ≥ ε) → 0

New cards

Central Limit Theorem (Theorem 4.4.2)

If X₁, X₂, … are independent random variables with the same distribution, and each E(X_i) = μ and Var(X_i) = σ² > 0.

With S_n= X1 + X2 + … + Xn, and Z_n = (S_n - nμ)/σ√n

lim_n→∞ F_Zn(z) = Φ(z), and

E(Z_i) = 0, Var(Z_i) = 1

New cards

Using the Central Limit Theorem for large numbers of variables

In questions, if given a large number of independent random variables, e.g. a batch of components, approximate their sum with the standard normal distribution.

S_n/n = X̅ ~ N(μ, σ²/n), where the mean and variance are the sample mean and variance.

New cards

Using the Central Limit Theorem for binomial and large n

Decompose binomial into a sum of independent variables with the Bernoulli distribution
E(X_i) = p, and Var(X_i) = p(1-p), so μ = np, and σ² = np(1-p)/n.
Using the Central Limit Theorem, ∑X ~ N(μ, σ²/n).
Continuity Correction (since X is discrete) - P(X == k) ≈ P(k - 0.5 ≤ X ≤ k + 0.5)

So for exam questions, find the sum, then mean and variance (np and np(1-p), then sub into equation, approximate with standard using X - u/sigma with continuity correction and solve.

New cards

Covariance

A measure of how 2 variables change together.

Cov(X, Y) = E(X - E(X))(Y - E(Y))

If they are jointly distributed, then
Cov(X, Y) = E(XY) - E(X)E(Y)

If they are independent, then

Cov(X, Y) = 0, since E(XY) would equal E(X)E(Y)

New cards

Var(X + Y) for same-distribution variables

Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)

New cards

Pearson correlation coeffiicent

A normalised measure of the correlation between 2 variables.

p(X, Y) = Cov(X, Y) / √(Var(X)Var(Y))

If X and Y are independent, p(X, Y) will = 0.
p = 1 iff P(Y = a + bX) = 1
p = -1 iff P(Y = a - bX) = 1

New cards

Sample correlation coefficient

For a given dataset (sample):

r_{x, y} = mean(xy) - mean(x)mean(y) / sd(x)sd(y)

New cards

Linear Regression Model

With Y being the dependent variable and X being the independent variable, and n data points (X_i, Y_i), the simple linear regression model describes the relationship between the attributes by equation:

Y_i = α + βX_i + εi

where εi is the error term, which is independent, normally distributed with mean = 0 and unknown variance.

α and β are found by minimising the sum of squares of the residuals, which is the measured - predicted values (y_i - (α + βX_i)). These are called the least squares estimates of α and β.

New cards

R² value (in linear regression)

Var(predicted values of y) / Var(observed values of Y)

Or (r_{x, y})² for one independent variable.
Comes from:

ŷ = α + βX_i

y - ŷ = residual.

So 1/n * sum(y_i - ȳ)² = 1/n sum(y_i- ŷ_i)² + 1/n sum (ŷ - ȳ)²

Or, Var(y) = Mean(squared errors) + Var(predicted values)

and R² is the proportion of variance of observed values of Y predicted by the model.

New cards

Population Random Sample

A set of independent, identically distributed (IID) random variables, drawn from a larger population with unknown ‘population’ distribution.

New cards

Point Estimate of Mean and Variance

A single value estimate of a population parameter, such as the mean or proportion, derived from a sample. Its function is called its ‘point estimator’.

For mean and variance:

E(X̅) = μ
Var(X̅) = σ² / n

Where μ and σ² are the mean and variance of the entire population.

This means we assume the mean to get closer to the right value, (because it’s Sum / n) and variance to get closer to 0 as n → infinity.

New cards

S² - Sample Variance for Point Estimations

Regular sample variance is n-1/n * σ², so to fix it we use S² which is the same as variance, but dividing by n-1 instead of n.

E(S²) = 1/n-1(∑X²) - n/n-1(x̄)²

New cards

Confidence Interval with normal-distribution

A random interval (X_L, X_U) where the parameter is inside with probability/confidence level (1-α). This is a 100(1-α)% confidence interval. Used when variance is known.

To calculate:

[X̅ - (σz_α/2)/√n, X̅ + (σz_α/2)/√n]

where:

X̅ is the sample mean
σ is the standard deviation
z_a/2 is the value where P(Z > z_α/2) = α/2 (given in Q)
n is the number of samples

New cards

t-distribution

Family of distributions generalising Normal distributions with a parameter called degrees of freedom, v. Has heavier tails.

As v → ∞, the t-distribution → normal.

New cards

Confidence Interval with t-distribution

Used when variance is unknown.

For a 100(1-α)% confidence interval:

To calculate:

[X̅ - (st_n-1,α/2)/√n, X̅ + (st_n-1,α/2)/√n]

X̅ is the sample mean
s is the sample standard deviation (unbiased)
t_n,a/2 is the value where P(T > t_n,a/2) = α/2 (given in Q)
n is the number of samples

New cards

Conditional Expectation E(Y | A)

E(Y | A) = ∑_y (y * P(Y == y | A))

e.g. E(dice roll | odd) = ∑{1, 3, 5} / {1, 3, 5}

For continuous:

E(Y | A) = ∫^∞_-_∞y f_Y|A(y, a)dy

New cards

Conditional Expectation on another variable E(Y | X)

E(Y | X==x) = ∑_y (y * P(Y == y | X == x)) = ∑_y y*f_Y|X(y, x)

e.g. Y is dice roll result, X is 0 if even, 1 if odd

A is event of being odd

E(Y | A) = E(Y | X = 1) = 3 and E(Y | A^c) = E(Y | X = 0) = 4

So you could also conclude E(Y | X == x) = E(Y|X) = 4 - X

New cards

Conditional Expectation collapse

E(E(Y|X)) = E(Y)

On the right is a sum or integral over the possible values that Y can take.

On the left we have the expectation of the random variable E(Y |X)
which is a function of X . So the outer expectation is a sum or integral over all the possible values that X can take

New cards

Random k-vector X

A column vector with k jointly-distributed random variables as components.

New cards

Mean of random k-vector

μ = (μ₁, … u_k) where u_i= E(X_i)

New cards

Covariance Matrix

A k_^xk matrix where the diagonal has the Var(X_i) and everywhere else has Covariance(X_i, X_j).

These are symmetric since covariance is symmetric.

New cards

Affine Transformation

For vectors/matrices,

Y= AX + c

A is an m_^xk matrix
X is a k_^x1 random vector
c is a column m-vector

Y is a random vector result.

New cards

Mean and Covariance of Affine Transformation

If Y = Ax + c (an affine transformation):

Mean vector is Aμ + c
Covariance is AΣA^T

(Σ is the original cov. matrix)

New cards

Affine Transformation for Mean 0 and Variance 1

With random vector X, and assuming its covariance matrix is invertible:

P is the orthogonal matrix for P^-1∑P = D
X’’ = P^-1(X - μ)
Y = 1/√D * X’’

i.e. Shift, Rotate, Rescale

New cards

Multivariate Normal Distribution Properties

If X has the multivariate normal distribution:

Y = AX + c also has the distribution (when A∑A^T is invertible)
For each i, marginal distribution of X_i is normal with mean μi and variance Σi,i.
If X_i and X_j are uncorrelated, they’re also independent.