Lecture 3 – Probability, Discrete RVs & Binomial Distribution (MATH1041)

Course & Chapter Context
  • Course: MATH1041 – Statistics for Life and Social Sciences (Term 2, 2025)

  • Chapter 3 focus: “Probability, Discrete Random Variables & the Binomial Distribution”

  • Four lecture blocks covered in the transcript:

    • Lecture 1 – Probability

    • Lecture 2 – Discrete Random Variables

    • Lecture 3 – Means & Variances for Discrete Random Variables

    • Lecture 4 – The Binomial Distribution & Other Probability Models


Quick Regression Recap (link-back to previous week)
  • Least-squares regression line: used to predict \hat y from x; evaluate with residual plots.

  • r = correlation coefficient (strength & direction); r^{2} = % of variation in y explained by x.

    Example: r=-0.78\,; r^{2}=0.61 ⇒ 61 % of y’s variability explained by x.

  • Residual =y-\hat y; want random scatter (homoscedasticity). Heteroscedasticity (“trumpet” shape) violates assumptions.


Why Study Probability?
  • Random assignment & random sampling ⇒ data viewed as outcomes of random phenomena.

  • Probability rules are the language for interpreting data produced by chance (forms basis for CI’s, hypothesis tests, ML, AI, etc.).


Fundamental Concepts & Vocabulary
  • Random Phenomenon / Experiment: individual outcome unpredictable, but long-run pattern exists.

  • Outcome (\omega): single possible result.

  • Sample Space (S): set of all theoretical outcomes (e.g. S={H,T} for one coin).

  • Event (A): any subset of S. Occurs if outcome \;\in\; A.

  • Probability of outcome/event: long-run relative frequency.

Random vs Deterministic vs Haphazard
  • Deterministic: outcome known in advance (e.g. sunrise).

  • Random: unpredictable trial-by-trial, but regular long-run distribution.

  • Haphazard: too irregular to model (no stable pattern).

Assigning Probabilities
  1. Long-run empirical frequency (repeat experiment many times).

  2. Physical symmetry / equally-likely outcomes (p_i=1/L).

  3. Subjective / expert judgement.


Five Core Probability Rules
  1. Boundedness: 0\le P(A)\le1.

  2. Covering Rule: P(S)=1\;\;(\sum p_i=1).

  3. Additive (General): P(A\text{ or }B)=P(A)+P(B)-P(A\text{ and }B).

    • If A,B disjoint ⇒ P(A\text{ or }B)=P(A)+P(B).

  4. Complement: P(A^c)=1-P(A).

  5. Multiplication (Independence): A \;\perp\; B iff P(A\cap B)=P(A)P(B)\;\;(\Leftrightarrow P(B|A)=P(B)) .


Conditional Probability & Independence
  • Definition: P(B|A)=\dfrac{P(A\cap B)}{P(A)} (requires P(A)>0).

  • Think “A becomes the new sample space”.

  • Tree-diagram helps: P(A\cap B)=P(A)\,P(B|A).

  • Independent \;\ne\; Mutually Exclusive. Exclusive events cannot co-occur (intersection = 0), whereas independent events can but probability factorises.

Example (Student Enrolments):

  • Table: 87 % full-time, 15 % international, 14 % both.

  • P(FT|Int)=\tfrac{0.14}{0.15}=0.93 vs P(FT)=0.87 ⇒ not equal ⇒ dependence.


Random Variables (RV)
  • Formal: function X:S\to\mathbb R assigning a numerical value to each outcome.

  • Upper-case letter = RV; lower-case = realisation (observed value).

  • Discrete: countable list x1,x2,\dots (often counts).

  • Continuous: takes any value in an interval (measurements).

Probability Distribution of Discrete RV
  • Table/pmf pX(xk)=P(X=xk) satisfying \sum pX(xk)=1 & pX(x_k)\ge0.

  • Graph: probability mass plot (vertical spikes).

Examples

  • Spinner game payouts: x\in{-4,2,6},\;p=(3/8,4/8,1/8) .

  • Two dice sum Y: pmf with 11 possible sums (2–12).

Key Specialized Discrete Distributions
  • Uniform (discrete): P(X=r)=1/L for r=1,\dots,L .

  • Binomial B(n,p).

  • Poisson Pois(\lambda) (rare events / rate).

  • Hypergeometric HGeom(m,N-m,n) (sampling without replacement).

  • Geometric (failures before 1st success with prob. p).


Mean (Expectation) & Variance of Discrete RV
  • Mean (expected value): \muX=E[X]=\sum{k}xk pk .

  • Variance: \sigmaX^{2}=E[(X-\muX)^{2}]=\sum{k}(xk-\muX)^{2}pk .

  • Std-dev \sigmaX=\sqrt{\sigmaX^{2}} .

Linear Combination Rules (always hold)

• For constants a,b: E[a+bX]=a+b\muX . • For two RVs: E[X\pm Y]=\muX\pm\mu_Y .

Variance Rules (need independence except scaling)
  • Scaling: Var(a+bX)=b^{2}Var(X).

  • Add/Sub independent: Var(X\pm Y)=Var(X)+Var(Y) (note “+” in both cases).

  • Std-dev does not add linearly.

Law of Large Numbers (LLN)
  • For i.i.d. X1,\dots,Xn with mean \mu: sample mean \bar X = \frac{1}{n}\sum X_i converges to \mu as n\to\infty .

  • Justifies interpreting probability & expectation as long-run averages.


The Binomial Distribution B(n,p)

Conditions (Bernoulli trials):

  1. Fixed number n of trials.

  2. Each trial results in Success/Failure.

  3. Trials independent.

  4. Constant success probability p.

Probability mass function:

\quad P(X=x)=\binom{n}{x}p^{x}(1-p)^{n-x},\quad x=0,1,\dots,n.

Key moments:

\quad E[X]=np,\qquad Var(X)=np(1-p),\qquad \sigma=\sqrt{np(1-p)}.

Counting Tool \binom{n}{r}

\binom{n}{r}=\frac{n!}{r!(n-r)!} = number of unordered subsets of size r.

Practical Examples
  • True/False test (n=5, p=0.5): P(\text{score}\ge3)=0.5 .

  • Inspector weighing 10 cereal boxes (p=0.2 defective): P(X\ge2)=1-[P(0)+P(1)]\approx0.624 (used complement rule).


Other Discrete Models (Quick Reference)
  • Poisson(\lambda): count of rare independent events in fixed interval.

    • pmf P(X=k)=e^{-\lambda}\,\lambda^{k}/k!; mean = var = \lambda.

  • Hypergeometric: without-replacement sampling; parameters (white m, black b, draws n).

    • P(X=x)=\frac{\binom{m}{x}\binom{b}{n-x}}{\binom{m+b}{n}}.

  • Geometric(p): # failures before 1st success.

    • pmf P(X=x)=(1-p)^{x}p; E[X]=\frac{1-p}{p},\;Var(X)=\frac{1-p}{p^{2}}.


Worked-Example Gallery
  1. Left-handed lunch group (n=3, p=0.12):

    • P(X=2)=3\times0.12^{2}\times0.88\approx0.038.

    • P(X\le2)=1-P(3)=1-0.12^{3}\approx0.998$.

  2. Social-media dual users (n=5, p=0.6):

    • E[X]=3,\;\sigma^{2}=1.2,\;\sigma\approx1.10.

  3. Seed-weighing with measurement error \sigma=10\,\text{mg}: averaging two weighs halves variance ⇒ \sigma_{\bar X}=7.07\,\text{mg}.

  4. Double die roll:

    • Single die Var=2.916; sum of two independent ⇒ Var=5.832.


Modelling Checklist
  1. Identify outcome, sample space, event(s).

  2. Decide if equally-likely or assign probabilities empirically/symmetry.

  3. Translate English to probability notation.

  4. Test binomial conditions; if met, store n,p.

  5. Compute desires via pmf / complement / cumulative functions.

  6. Interpret in context (units, risk, practical meaning).


Common Pitfalls & Ethical Notes
  • Multiplying probabilities requires independence (Sally Clark SIDS case – tragic mis-use).

  • “Mean” could denote population mean (\mu), sample mean (\bar X RV), or observed mean (number). Clarify!

  • Risk \;\ne\; expected profit; variance matters in decisions.

  • Always plot data before fitting models (Anscombe’s, Greta’s regression caution).


Study & Exam Tips
  • Re-derive formulas once without notes – embeds memory.

  • Practise tree diagrams and calculator/R functions (dbinom, pbinom, choose, rnorm, etc.).

  • Build small simulations to verify analytical answers (R replicate loops).

  • Prepare a personal glossary (keywords slides pp 219-220).

  • Use “Seeing Theory” interactive site for visual intuition.


Key Equations (Quick Sheet)
  • P(A\cup B)=P(A)+P(B)-P(A\cap B)

  • P(A^c)=1-P(A)

  • P(B|A)=\frac{P(A\cap B)}{P(A)}

  • E[X]=\sum xk pk

  • Var(X)=E[(X-\mu)^2]

  • \binom{n}{r}=\frac{n!}{r!(n-r)!}

  • P_{\text{Bin}}(X=x)=\binom{n}{x}p^{x}(1-p)^{n-x}

  • E{\text{Bin}}=np,\;Var{\text{Bin}}=np(1-p)

  • P_{\text{Pois}}(X=k)=e^{-\lambda}\,\lambda^{k}/k!$$


Reflective Questions
  • Can I explain (without symbols) the difference between independent & mutually exclusive?

  • Given a problem, what clues signal a binomial vs hypergeometric vs Poisson model?

  • How does the Law of Large Numbers justify quality-control sampling?

  • In what way do variance rules change when RVs are dependent?


End of consolidated notes.