Lecture 3 – Probability, Discrete RVs & Binomial Distribution (MATH1041)

Course & Chapter Context

Course: MATH1041 – Statistics for Life and Social Sciences (Term 2, 2025)
Chapter 3 focus: “Probability, Discrete Random Variables & the Binomial Distribution”
Four lecture blocks covered in the transcript:
• Lecture 1 – Probability
• Lecture 2 – Discrete Random Variables
• Lecture 3 – Means & Variances for Discrete Random Variables
• Lecture 4 – The Binomial Distribution & Other Probability Models

Quick Regression Recap (link-back to previous week)

Least-squares regression line: used to predict \hat y from x; evaluate with residual plots.
r = correlation coefficient (strength & direction); r^{2} = % of variation in y explained by x.
Example: r=-0.78\,; r^{2}=0.61 ⇒ 61 % of y’s variability explained by x.
Residual =y-\hat y; want random scatter (homoscedasticity). Heteroscedasticity (“trumpet” shape) violates assumptions.

Why Study Probability?

Random assignment & random sampling ⇒ data viewed as outcomes of random phenomena.
Probability rules are the language for interpreting data produced by chance (forms basis for CI’s, hypothesis tests, ML, AI, etc.).

Fundamental Concepts & Vocabulary

Random Phenomenon / Experiment: individual outcome unpredictable, but long-run pattern exists.
Outcome (\omega): single possible result.
Sample Space (S): set of all theoretical outcomes (e.g. S={H,T} for one coin).
Event (A): any subset of S. Occurs if outcome \;\in\; A.
Probability of outcome/event: long-run relative frequency.

Random vs Deterministic vs Haphazard

Deterministic: outcome known in advance (e.g. sunrise).
Random: unpredictable trial-by-trial, but regular long-run distribution.
Haphazard: too irregular to model (no stable pattern).

Assigning Probabilities

Long-run empirical frequency (repeat experiment many times).
Physical symmetry / equally-likely outcomes (p_i=1/L).
Subjective / expert judgement.

Five Core Probability Rules

Boundedness: 0\le P(A)\le1.
Covering Rule: P(S)=1\;\;(\sum p_i=1).
Additive (General): P(A\text{ or }B)=P(A)+P(B)-P(A\text{ and }B).
• If A,B disjoint ⇒ P(A\text{ or }B)=P(A)+P(B).
Complement: P(A^c)=1-P(A).
Multiplication (Independence): A \;\perp\; B iff P(A\cap B)=P(A)P(B)\;\;(\Leftrightarrow P(B|A)=P(B)) .

Conditional Probability & Independence

Definition: P(B|A)=\dfrac{P(A\cap B)}{P(A)} (requires P(A)>0).
Think “A becomes the new sample space”.
Tree-diagram helps: P(A\cap B)=P(A)\,P(B|A).
Independent \;\ne\; Mutually Exclusive. Exclusive events cannot co-occur (intersection = 0), whereas independent events can but probability factorises.

Example (Student Enrolments):

Table: 87 % full-time, 15 % international, 14 % both.
P(FT|Int)=\tfrac{0.14}{0.15}=0.93 vs P(FT)=0.87 ⇒ not equal ⇒ dependence.

Random Variables (RV)

Formal: function X:S\to\mathbb R assigning a numerical value to each outcome.
Upper-case letter = RV; lower-case = realisation (observed value).
Discrete: countable list x1,x2,\dots (often counts).
Continuous: takes any value in an interval (measurements).

Probability Distribution of Discrete RV

Table/pmf pX(xk)=P(X=xk) satisfying \sum pX(xk)=1 & pX(x_k)\ge0.
Graph: probability mass plot (vertical spikes).

Examples

Spinner game payouts: x\in{-4,2,6},\;p=(3/8,4/8,1/8) .
Two dice sum Y: pmf with 11 possible sums (2–12).

Key Specialized Discrete Distributions

Uniform (discrete): P(X=r)=1/L for r=1,\dots,L .
Binomial B(n,p).
Poisson Pois(\lambda) (rare events / rate).
Hypergeometric HGeom(m,N-m,n) (sampling without replacement).
Geometric (failures before 1st success with prob. p).

Mean (Expectation) & Variance of Discrete RV

Mean (expected value): \muX=E[X]=\sum{k}xk pk .
Variance: \sigmaX^{2}=E[(X-\muX)^{2}]=\sum{k}(xk-\muX)^{2}pk .
Std-dev \sigmaX=\sqrt{\sigmaX^{2}} .

Linear Combination Rules (always hold)

• For constants a,b: E[a+bX]=a+b\muX . • For two RVs: E[X\pm Y]=\muX\pm\mu_Y .

Variance Rules (need independence except scaling)

Scaling: Var(a+bX)=b^{2}Var(X).
Add/Sub independent: Var(X\pm Y)=Var(X)+Var(Y) (note “+” in both cases).
Std-dev does not add linearly.

Law of Large Numbers (LLN)

For i.i.d. X1,\dots,Xn with mean \mu: sample mean \bar X = \frac{1}{n}\sum X_i converges to \mu as n\to\infty .
Justifies interpreting probability & expectation as long-run averages.

The Binomial Distribution B(n,p)

Conditions (Bernoulli trials):

Fixed number n of trials.
Each trial results in Success/Failure.
Trials independent.
Constant success probability p.

Probability mass function:

\quad P(X=x)=\binom{n}{x}p^{x}(1-p)^{n-x},\quad x=0,1,\dots,n.

Key moments:

\quad E[X]=np,\qquad Var(X)=np(1-p),\qquad \sigma=\sqrt{np(1-p)}.

Counting Tool \binom{n}{r}

\binom{n}{r}=\frac{n!}{r!(n-r)!} = number of unordered subsets of size r.

Practical Examples

True/False test (n=5, p=0.5): P(\text{score}\ge3)=0.5 .
Inspector weighing 10 cereal boxes (p=0.2 defective): P(X\ge2)=1-[P(0)+P(1)]\approx0.624 (used complement rule).

Other Discrete Models (Quick Reference)

Poisson(\lambda): count of rare independent events in fixed interval.
• pmf P(X=k)=e^{-\lambda}\,\lambda^{k}/k!; mean = var = \lambda.
Hypergeometric: without-replacement sampling; parameters (white m, black b, draws n).
• P(X=x)=\frac{\binom{m}{x}\binom{b}{n-x}}{\binom{m+b}{n}}.
Geometric(p): # failures before 1st success.
• pmf P(X=x)=(1-p)^{x}p; E[X]=\frac{1-p}{p},\;Var(X)=\frac{1-p}{p^{2}}.

Worked-Example Gallery

Left-handed lunch group (n=3, p=0.12):
• P(X=2)=3\times0.12^{2}\times0.88\approx0.038.
• P(X\le2)=1-P(3)=1-0.12^{3}\approx0.998$.
Social-media dual users (n=5, p=0.6):
• E[X]=3,\;\sigma^{2}=1.2,\;\sigma\approx1.10.
Seed-weighing with measurement error \sigma=10\,\text{mg}: averaging two weighs halves variance ⇒ \sigma_{\bar X}=7.07\,\text{mg}.
Double die roll:
• Single die Var=2.916; sum of two independent ⇒ Var=5.832.

Modelling Checklist

Identify outcome, sample space, event(s).
Decide if equally-likely or assign probabilities empirically/symmetry.
Translate English to probability notation.
Test binomial conditions; if met, store n,p.
Compute desires via pmf / complement / cumulative functions.
Interpret in context (units, risk, practical meaning).

Common Pitfalls & Ethical Notes

Multiplying probabilities requires independence (Sally Clark SIDS case – tragic mis-use).
“Mean” could denote population mean (\mu), sample mean (\bar X RV), or observed mean (number). Clarify!
Risk \;\ne\; expected profit; variance matters in decisions.
Always plot data before fitting models (Anscombe’s, Greta’s regression caution).

Study & Exam Tips

Re-derive formulas once without notes – embeds memory.
Practise tree diagrams and calculator/R functions (dbinom, pbinom, choose, rnorm, etc.).
Build small simulations to verify analytical answers (R replicate loops).
Prepare a personal glossary (keywords slides pp 219-220).
Use “Seeing Theory” interactive site for visual intuition.

Key Equations (Quick Sheet)

P(A\cup B)=P(A)+P(B)-P(A\cap B)
P(A^c)=1-P(A)
P(B|A)=\frac{P(A\cap B)}{P(A)}
E[X]=\sum xk pk
Var(X)=E[(X-\mu)^2]
\binom{n}{r}=\frac{n!}{r!(n-r)!}
P_{\text{Bin}}(X=x)=\binom{n}{x}p^{x}(1-p)^{n-x}
E{\text{Bin}}=np,\;Var{\text{Bin}}=np(1-p)
P_{\text{Pois}}(X=k)=e^{-\lambda}\,\lambda^{k}/k!$$

Reflective Questions

Can I explain (without symbols) the difference between independent & mutually exclusive?
Given a problem, what clues signal a binomial vs hypergeometric vs Poisson model?
How does the Law of Large Numbers justify quality-control sampling?
In what way do variance rules change when RVs are dependent?

End of consolidated notes.