COMP2870 Probability & Statistics

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/74

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

75 Terms

1
New cards

Dataset

A structured form of data points usually organised into a tabular form. Each column represents an attribute, and each row represents a value for each attribute, linked in some way (e.g. same subject).

2
New cards

Types of Data Attributes

  • Numerical (Quantitative) - can be measured/counted, e.g. height

    • Discrete - Takes a finite/countable number of values, e.g. number of children.

    • Continuous - Any value in an interval(s) can (theoretically) be taken, e.g. height).

  • Categorical (Qualitative) - can’t be measured/counted, e.g. gender

    • Ordinal - Has a natural defined order, e.g. rankings

    • Nominal - Has no natural order, e.g. colours

3
New cards

Frequency Table

A table that displays the number of occurrences (frequency) of each category in a dataset.

4
New cards

Grouping Continuous Data

Split values into intervals (bins), calculate frequency within each bin, depict with a histogram.

5
New cards

Mean

x̄ = 1/n ∑xi

For real numbers k and a:

  • mean(x1 + k, x2 + k + …) = mean(x) + k

  • mean(ax1 + ax2 + …) = mean(x) * a

6
New cards

Median

The middle number (or average of 2 middles) in a dataset.

For real numbers k and a:

  • median(x1 + k, x2 + k + …) = median(x) + k

  • median(ax1 + ax2 + …) = median(x) * a

7
New cards

Outlier

A value in a dataset which is vastly different to the majority of data.

8
New cards

Variance

Average of the squared differences from the mean.

Var(x1, x2, …, xn) = 1/n ∑(xi - x̄)²

Alternatively, expressed as:

Var(x1, x2, …, xn) = mean(x²1, … x²n) - x̄², aka ‘the mean of the squares minus the square of the mean)’ - MSMSM.

For real numbers k and a:

  • var(x1 + k, x2 + k + …) = var(x)

  • var(ax1 + ax2 + …) = var(x) * a²

9
New cards

Standard Deviation

Amount of dispersion or spread in a dataset.

sd(x1, x2, …, xn) = sqrt(Var(x1, x2, …, xn))

For real numbers k and a:

  • sd(x1 + k, x2 + k + …) = sd(x)

  • sd(ax1 + ax2 + …) = sd(x) * |a|

10
New cards

Upper/Lower Quartiles

Lower Quartile - Median of lower half of a dataset

Upper Quartile - Median of upper half of a dataset

i.e. LQ and UQ are med of n each, and if 2n+1, they skip the middle

11
New cards

Interquartile Range

Measure of dispersion less sensitive to outliers.

IQR = UQ - LQ, where:

UQ is upper quartile, median of top half

LQ is lower quartile, median of bottom half

12
New cards

Sample Space (Ω)

The set of all possible outcomes of a random experiment.

13
New cards

Event

A subset of the sample space, i.e. a set of one or more outcomes from Ω.

Set of all events is denoted by F. If result of experiment is in E, then E ‘occurred’.

14
New cards

Mutually Exclusive

Events that cannot occur simultaneously.

Ei ∩ Ej = ∅

For each distinct i and j this must be the case for > 2 events to all be mutually exclusive.

15
New cards

Independent Events

One event doesn’t affect the outcome of another.

P(A ∩ B) = P(A)P(B)

16
New cards

Discrete Random Variable

A function defined on an experiment’s sample space,

mapping X: Ω → something. If X : Ω → R, X is a random variable.

If X’s output range is a finite or countable set, e.g. integers, then X is a discrete random variable.

If g: R → R, g is also a discrete random variable, mapping outcomes w to g(X(w)), i.e. the probability that X = w.

17
New cards

P(X = x)

Shorthand form of writing P({w: X(w) = x}), i.e. the probability that the random variable’s outcome is equal to a specific value x.

18
New cards

Support (of discrete random variable)

The set of values of x for which P(X = x) > 0.

19
New cards

Probability Mass Function fx(x)

fx(x) = P(X == x) for all values x in the support. Creates a ‘bar chart’.

20
New cards

Cumulative Distribution Function Fx(x)

Fx(x) = P(X ≤ x).

For a discrete random variable X, Fx is a straight line at each integer, and then jumps to the next value. Creates the ‘arrow’ graph.

21
New cards

Continuous Random Variable

A random variable that can take any value within a range. A variable X: Ω → R with the property that there is a density function fx, such that:

For all a and b with a b:

P(a ≤ X ≤ b) = ab fX(u) du

22
New cards

Density Function

A function fx(u) that describes the probability distribution of a continuous random variable X. It is used to determine the probability of X falling within a certain range by integrating over that range.

  • fx(x) ≥ 0 for all x ∈ R

  • -∞ fX(u) du = 1

  • P(X == x) = 0 for all x ∈ R, i.e. the exact point’s probability is 0 (since integrating from a to a has 0 area).

23
New cards

Uniform Distribution

A probability distribution where all outcomes are equally likely within a specified range.

For a continuous uniform distribution, this has density function:
fx(x) = 1/b-a, a ≤ x ≤ b

So for a uniform distribution from [0, 1], this is fx(x) = 1 for a ≤ X ≤ b.

24
New cards

Cumulative Distribution Function

A function that describes the probability that a continuous random variable, X, will take the value a constant.

Fx(x) = P(X ≤ x) = x-∞ fX(u) du

e.g.,

P(a ≤ X ≤ b) = P(X ≤ b) - P(X ≤ a) = Fx(b) - Fx(a)

  • lim x→∞ Fx(x) = 1

  • lim x→-∞ Fx(x) = 0

  • Fx is increasing

  • fx = Fx

25
New cards

Exponential Distribution

A probability distribution that models the time until an event occurs.

X has exponential distribution, with parameter λ > 0, if density function:

fx(x) = λe-λx, x ≥ 0, 0 otherwise

and therefore cumulative distribution function:

Fx(x) = 1 - e-λx, x ≥ 0, 0 otherwise

X is the time until something happens, e.g. P(X < 1) is the probability a TV fails within 1 year.

26
New cards

Normal Distribution

A probability distribution that is symmetric around the mean, μ.

A continuous random variable X has the normal distribution, i.e. X ~ N(μ, σ²), where σ² is the variance, if its density function is:

fx(x) = (1 / (σ√(2π))) * e^(-(x - μ)² / (2σ²))

When μ = 0 and σ = 1, it’s the Standard Normal Distribution.

27
New cards

Standard Normal Distribution

A form of the Normal Distribution where μ = 0 and σ = 1. It’s denoted by Z, so a continuous random variable X ~ Z(0, 1).

Its cumulative distribution function, Fz(x), is denoted by:

ϕ(x) = P(Z ≤ x)

28
New cards

Functions of Random Variables Method

Since a function of a random variable, i.e. f: X → R, is also a random variable, this is used to derive distributions of unknown functions.

  • Write the new function’s cumulative distribution function, i.e. Fx(Y)

  • Rearrange for the known variable, i.e. Fx(X)

  • Solve for the density function by differentiating

E.g.

fX(x) = 2x, FX(x) = x², Y = f(X) = X²

FY(y) = P(Y ≤ y) = P(X² ≤ y) = P(X ≤ √y) = Fx(√y) = y, so Fy(y) = y

fy(y) = 1 (differentiate)

29
New cards

X ~ N(μ, σ² ) and Y = aX + b, a =/= 0.

Y ~ N(aμ + b, (aσ)²)

This result allows us to use ϕ (CDF of the Standard Normal Distribution) on regular Normally Distributed random variables, because if we take a = 1/σ and b = -μ/σ, we get:
Y = X/σ -μ/σ = (X - μ)/σ

so Y ~ N(0, 1), so Y ~ Z

e.g.

X ~ N(5, 10)

Let Y = (X - 5)/√10. Now, Y ~ N(0, 1), and we can use the Standard Normal Distribution to find probabilities related to X through Z.

P(X ≤ 2) = P(Y ≤ (2 - 5)/√10) = ϕ(−0.9487)

30
New cards

Generating random variables by converting from Uniform

If a special distribution isn’t available, we can convert the uniform distribution’s ‘amount of probability’ into a variable for the new distribution.

F is the cumulative distribution function of the unknown variable.

Take a uniform random variable, U, on [0,1]. And let X = F-1(U). This means X is an inverse function that takes the ‘level’ of uniform probability and maps it to the point along the axis that corresponds to the same probability under F.

P(X ≤ x) = P(F-1(U) ≤ x) (because of this)

= P(U ≤ F(x)) (Probability of point being to the left of x is the same as the probability level of U being under the CDF height at x)

= F(x) (because the meaning of a uniform random variable is that P(X <= x) = x, e.g. P(X <= 0.5 = 0.5)

So P(X ≤ x) = F(x)

In practice:

  • Take your cumulative distribution and inverse it, i.e. F-1(X)

  • Generate a random number on the uniform distribution

  • Sub the value you get into the inverse to get your new random variable X under the new distribution

31
New cards

Jointly Distributed

Random variables defined on the same sample space, with a joint probability distribution defining their behaviour.

32
New cards

Joint Probability/Bivariate Mass Function

A probability mass function for jointly distributed discrete random variables.

fx,y (x, y) = P(X = x, Y = y)

33
New cards

Marginal Mass Function

The probability distribution of a single variable from a jointly-distributed pair of random variables.

e.g. fx(x) = ∑y fx,y(x,y) = P(X = x)

34
New cards

Joint Probability/Bivariate Density Function

A probability density function for jointly distributed continuous random variables.

A function, fx,y(x, y) : ℝ x ℝ → ℝ:

P(a ≤ X ≤ b and c ≤ Y ≤ d) = ∫abcd fx,y(x, y) dy dx

  • fx,y(x, y) ≥ 0 for all x, y in ℝ

  • -- fx,y(x, y) dy dx = 1

35
New cards

Marginal Density Function

The probability density function of a single variable from a jointly-distributed pair of random variables.

e.g. fx(x) = ∫- fx,y(x, y) dy

36
New cards

Independent Random Variables

For every pair of A and B:

P(X ∈ A and Y ∈ B) = P(X ∈ A) * P(Y ∈ B)

This means that:

fx,y(x, y) = fx(x) * fy(y)

For jointly-distributed random variables, if X and Y are independent:

If fY(y) > 0, fX, Y(x | y) = fX(x)

37
New cards

Determining Probability of Joint Density Functions

To find the probability that (X, Y) lies in a region R,

P((X, Y) ∈ R) = ∫∫R fx,y(x, y) dy dx, where R defines the region.

The integral of the joint density function represents a volume, i.e. integrating over a region. (X, Y) lying in the region R corresponds to the event occurring.

Example:

If R is defined by the limits a ≤ X ≤ b and c ≤ Y ≤ d, then P((X, Y) ∈ R) = ∫abcd fx,y(x, y) dy dx.

38
New cards

Conditional Probability

The probability of one event restricted to the sample space of another.

P(A | B) = P(A n B)/P(B)

39
New cards

Jointly Distributed Conditional Probability

Using the equation for conditional probability:

If the variables are discrete:

P(X = x | Y = y) = fx,y(x, y) / fy(y)

  • fX|Y(x, y) ≥ 0 for all x

  • xfX|Y(x, y) = 1 (since the sample space is now Y)

If the variables are continuous:

fX|Y(x|y) = fX,Y(x, y) / fY(y)

  • fX|Y(x | y) ≥ 0 for all x

  • - fX|Y(x | y) = 1

40
New cards

Bayes Theorem

P(A | B) = P(B|A)P(A) / P(B)

41
New cards

Expectation of Discrete Random Variable

E(X) = ∑x * fX(x) = ∑x * P(X==x)

42
New cards

Expectation of Continuous Random Variable

E(X) = ∫- u fX(u) du

43
New cards

Law of the Unconscious Statistician

If X is a random variable and g is a function and Y = g(X):

If discrete:

E(Y) = ∑g(x) * fX(x) = ∑g(x) * P(X==x)

If continuous:
E(Y) = ∫- g(u) fX(u) du

For multiple variables (jointly distributed):

If W = g(X, Y), E(W) = ∫∫- g(x, y) fX,Y(x, y) dx, dy

44
New cards

Properties of Expectation of Random Variables

E(a0 + a1X1 + … + anXn) = a0 + a1E(X1) + … + anE(X2)

i.e. E(X + Y) = E(X) + E(Y)

If X and Y are jointly-distributed and independent,

E(XY) = E(X)E(Y)

45
New cards

Variance of Random Variables

Var(X) = E(X - E(X))² = E(X²) - E(X)² , ESMSE:

The expected of the squared minus the squared of the expected.

For jointly-distributed, independent random variables,

Var(a0 + a1X + …) = a1²Var(X) + …

46
New cards

Distribution of Sums of Normally-distributed independent variables

If X and Y are both independent and normally distributed, i.e.:

X ~ N(μ1, σ1²) and Y ~ N(μ2, σ2²)

aX + bY ~ N(aμ1 + bμ2, a²σ1² + b²σ2²)

i.e. the variables can be added and are normally distributed as well

47
New cards

Stochastically Dominates

If X and Y are random variables defined on the same sample space, if for every outcome ω, X(ω) ≥ Y(ω), then X stochastically dominates Y.

E.g. X represents the number of heads in 3 coin tosses, while Y represents the number of heads only in the final toss.

Then E(X) ≥ E(Y).

48
New cards

Markhov’s Inequality

If X is a random variable and a > 0, then:

P(|X| ≥ a) ≤ E(|X|)/a

Markov’s inequality tells us that if the expectation of |X| is not large, then the probability that |X| is large is small.

49
New cards

Chebyshev’s Inequality

If X is a random variable and a > 0, then:

P(|X - E(X)| ≥ a) = Var(X)/a²

Chebyshev’s Inequality gives an upper bound on the probability that a variable is far away from its mean.

If we took a = kσ, then this becomes:

P(|X - E(X) kσ) = 1/k².

50
New cards

The Weak Law of Large Numbers (Theorem 4.4.1)

If X1, X2 … are independent random variables with the same distribution, and E(Xi) = μ and Var(Xi) = σ². For every ε > 0:

P(|Sn/n - μ| ≥ ε) ≤ σ²/(nε²), and as n → infinity, this P → 0

P(|Sn/n - μ| ≥ ε) → 0

51
New cards

Central Limit Theorem (Theorem 4.4.2)

If X1, X2, … are independent random variables with the same distribution, and each E(Xi) = μ and Var(Xi) = σ² > 0.

With Sn = X1 + X2 + … + Xn, and Zn = (Sn - nμ)/σ√n

limn→∞ FZn(z) = Φ(z), and

E(Zi) = 0, Var(Zi) = 1

52
New cards

Using the Central Limit Theorem for large numbers of variables

In questions, if given a large number of independent random variables, e.g. a batch of components, approximate their sum with the standard normal distribution.

  • Sn/n = X̅ ~ N(μ, σ²/n), where the mean and variance are the sample mean and variance.

53
New cards

Using the Central Limit Theorem for binomial and large n

  • Decompose binomial into a sum of independent variables with the Bernoulli distribution

  • E(Xi) = p, and Var(Xi) = p(1-p), so μ = np, and σ² = np(1-p)/n.

  • Using the Central Limit Theorem, ∑X ~ N(μ, σ²/n).

  • Continuity Correction (since X is discrete) - P(X == k) ≈ P(k - 0.5 ≤ X ≤ k + 0.5)

So for exam questions, find the sum, then mean and variance (np and np(1-p), then sub into equation, approximate with standard using X - u/sigma with continuity correction and solve.

54
New cards

Covariance

A measure of how 2 variables change together.

Cov(X, Y) = E(X - E(X))(Y - E(Y))

If they are jointly distributed, then
Cov(X, Y) = E(XY) - E(X)E(Y)

If they are independent, then

Cov(X, Y) = 0, since E(XY) would equal E(X)E(Y)

55
New cards

Var(X + Y) for same-distribution variables

Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)

56
New cards

Pearson correlation coeffiicent

A normalised measure of the correlation between 2 variables.

p(X, Y) = Cov(X, Y) / √(Var(X)Var(Y))

  • If X and Y are independent, p(X, Y) will = 0.

  • p = 1 iff P(Y = a + bX) = 1

  • p = -1 iff P(Y = a - bX) = 1

57
New cards

Sample correlation coefficient

For a given dataset (sample):

rx, y = mean(xy) - mean(x)mean(y) / sd(x)sd(y)

58
New cards

Linear Regression Model

With Y being the dependent variable and X being the independent variable, and n data points (Xi, Yi), the simple linear regression model describes the relationship between the attributes by equation:

Yi = α + βXi + εi

where εi is the error term, which is independent, normally distributed with mean = 0 and unknown variance.

α and β are found by minimising the sum of squares of the residuals, which is the measured - predicted values (yi - (α + βXi)). These are called the least squares estimates of α and β.

59
New cards

R² value (in linear regression)

Var(predicted values of y) / Var(observed values of Y)

Or (rx, y)² for one independent variable.
Comes from:

ŷ = α + βXi

y - ŷ = residual.

So 1/n * sum(yi - ȳ)² = 1/n sum(yi - ŷi)² + 1/n sum (ŷ - ȳ)²

Or, Var(y) = Mean(squared errors) + Var(predicted values)

and R² is the proportion of variance of observed values of Y predicted by the model.

60
New cards

Population Random Sample

A set of independent, identically distributed (IID) random variables, drawn from a larger population with unknown ‘population’ distribution.

61
New cards

Point Estimate of Mean and Variance

A single value estimate of a population parameter, such as the mean or proportion, derived from a sample. Its function is called its ‘point estimator’.

For mean and variance:

  • E(X̅) = μ

  • Var(X̅) = σ² / n

Where μ and σ² are the mean and variance of the entire population.

This means we assume the mean to get closer to the right value, (because it’s Sum / n) and variance to get closer to 0 as n → infinity.

62
New cards

S² - Sample Variance for Point Estimations

Regular sample variance is n-1/n * σ², so to fix it we use S² which is the same as variance, but dividing by n-1 instead of n.

E(S²) = 1/n-1(∑X²) - n/n-1(x̄)²

63
New cards

Confidence Interval with normal-distribution

A random interval (XL, XU) where the parameter is inside with probability/confidence level (1-α). This is a 100(1-α)% confidence interval. Used when variance is known.

To calculate:

[X̅ - (σzα/2)/√n, X̅ + (σzα/2)/√n]

where:

  • X̅ is the sample mean

  • σ is the standard deviation

  • za/2 is the value where P(Z > zα/2) = α/2 (given in Q)

  • n is the number of samples

64
New cards

t-distribution

Family of distributions generalising Normal distributions with a parameter called degrees of freedom, v. Has heavier tails.

As v → ∞, the t-distribution → normal.

65
New cards

Confidence Interval with t-distribution

Used when variance is unknown.

For a 100(1-α)% confidence interval:

To calculate:

[X̅ - (stn-1,α/2)/√n, X̅ + (stn-1,α/2)/√n]

  • X̅ is the sample mean

  • s is the sample standard deviation (unbiased)

  • tn,a/2 is the value where P(T > tn,a/2) = α/2 (given in Q)

  • n is the number of samples

66
New cards

Conditional Expectation E(Y | A)

E(Y | A) = ∑y (y * P(Y == y | A))

e.g. E(dice roll | odd) = ∑{1, 3, 5} / {1, 3, 5}

For continuous:

E(Y | A) = -y fY|A(y, a)dy

67
New cards

Conditional Expectation on another variable E(Y | X)

E(Y | X==x) = ∑y (y * P(Y == y | X == x)) = ∑y y*fY|X(y, x)

e.g. Y is dice roll result, X is 0 if even, 1 if odd

A is event of being odd

E(Y | A) = E(Y | X = 1) = 3 and E(Y | Ac) = E(Y | X = 0) = 4

So you could also conclude E(Y | X == x) = E(Y|X) = 4 - X

68
New cards

Conditional Expectation collapse

E(E(Y|X)) = E(Y)


On the right is a sum or integral over the possible values that Y can take.


On the left we have the expectation of the random variable E(Y |X)
which is a function of X . So the outer expectation is a sum or integral over all the possible values that X can take

69
New cards

Random k-vector X

A column vector with k jointly-distributed random variables as components.

70
New cards

Mean of random k-vector

μ = (μ1, … uk) where ui = E(Xi)

71
New cards

Covariance Matrix

A kxk matrix where the diagonal has the Var(Xi) and everywhere else has Covariance(Xi, Xj).

These are symmetric since covariance is symmetric.

72
New cards

Affine Transformation

For vectors/matrices,

Y= AX + c

  • A is an mxk matrix

  • X is a kx1 random vector

  • c is a column m-vector

Y is a random vector result.

73
New cards

Mean and Covariance of Affine Transformation

If Y = Ax + c (an affine transformation):

  • Mean vector is Aμ + c

  • Covariance is AΣAT

(Σ is the original cov. matrix)

74
New cards

Affine Transformation for Mean 0 and Variance 1

With random vector X, and assuming its covariance matrix is invertible:

  • P is the orthogonal matrix for P-1∑P = D

  • X’’ = P-1(X - μ)

  • Y = 1/√D * X’’

i.e. Shift, Rotate, Rescale

75
New cards

Multivariate Normal Distribution Properties

If X has the multivariate normal distribution:

  • Y = AX + c also has the distribution (when A∑AT is invertible)

  • For each i, marginal distribution of Xi is normal with mean μi and variance Σi,i.

  • If Xi and Xj are uncorrelated, they’re also independent.