Lecture 4 Continuous Random Variables & Normal Distribution – Comprehensive Lecture Notes

Course Orientation

  • MATH1041 Statistics for Life and Social Sciences – Chapter 4 notes (Term 2 2025)
    • Overarching aim: introduce statistics as the science of collecting, analysing and interpreting data.
    • Chapter 4 focus: Continuous Random Variables & the Normal Distribution.
    • Four linked lectures:
    • L1 Continuous random variables & density curves.
    • L2 Normal distributions.
    • L3 Probabilities & quantiles from a normal distribution.
    • L4 Simulation examples.
    • Textbook alignment: Moore et al. (2021) Section 4.3 & Section 1.4.

Quick Revision of Discrete Random Variables (context from previous chapter)

  • Discrete rv’s: take countable values (finite or countably infinite list).
  • Binomial model recap (independent Bernoulli trials):
    • P(X=x)=(nx)px(1p)nx,  x=0,1,,nP(X=x)=\binom{n}{x}p^{x}(1-p)^{n-x},\;x=0,1,\dots,n
    • Mean μ<em>X=np\mu<em>X=np, variance σ</em>X2=np(1p)\sigma</em>X^{2}=np(1-p).
  • Rules for linear combinations (any r.v.’s):
    • E(a+bX)=a+bE(X),Var(a+bX)=b2Var(X)E(a+bX)=a+bE(X)\,,\qquad Var(a+bX)=b^{2}Var(X).
    • If independent, Var(X±Y)=Var(X)+Var(Y)Var(X\pm Y)=Var(X)+Var(Y) (NOT true for SD’s nor dependent RV’s).
  • Clarification of “mean” terminology:
    • True (population) mean μ<em>X\mu<em>X; sample mean Xˉ\bar X (a rv); observed mean xˉ</em>n\bar x</em>n (a number).

Lecture 1 – Continuous Random Variables & Density Curves

Definitions & Conceptual Bases
  • Uncountable set: cannot be put in a list (e.g. all reals in [0,1]).
  • Continuous rv: can take every value in an interval ⇒ range uncountable.
  • Practical criterion: before observing next value we cannot restrict it to a countable set.
  • Real-world recognition exercise (C vs D): rainfall, temperature, turtle weight → continuous; houses sold, football score → discrete; eye colour → categorical (not numeric rv).
  • Philosophical note: even height in a finite population is technically discrete but modelled as continuous for convenience.
Illustrative ‘truly continuous’ experiment
  • Roll a perfect sphere (diameter 2) with a black dot; observe height H[0,2]H\in[0,2] when ball stops. Every value plausible ⇒ H continuous.
Density Curves & Density Functions
  • Discrete pmf p()p(\cdot) ↔ continuous pdf f()f(\cdot).
  • Properties of a valid density:
    • f(x)0xf(x)\ge 0\,\forall x (no negative density).
    • Total area f(x)dx=1\int_{-\infty}^{\infty}f(x)dx=1.
  • Probability via area: P(a<X<b)=\int_{a}^{b}f(x)dx (integration not examinable; use geometry/R numerics).
  • Histogram ↔ smoothed density (kernel density example: UNSW travel times).
Examples & Exercises
  • Uniform[0,2] numbers:
    • Density height =12=\frac12 (rectangle), area = probability.
    • Results: P(0X1)=0.5P(0\le X\le1)=0.5, P(X=1/4)=0P(X=1/4)=0, general P(aXb)=ba2P(a\le X\le b)=\dfrac{b-a}{2}.
  • Baby‐smile times assumed Unif[0,23]Unif[0,23]: probability between 2 & 18 = (182)/23(18-2)/23; conditional P(X12X8)=1115P(X\ge12\mid X\ge8)=\frac{11}{15}.
Mean & Variance for Continuous rv’s (concept only)
  • E(X)=xf(x)dxE(X)=\int x f(x)dx, Var(X)=(xμ)2f(x)dxVar(X)=\int (x-\mu)^2 f(x)dx – parallels discrete sums.
  • Rules for means/variances identical to discrete case.

Lecture 2 – Normal Distributions

Motivation & Central Limit Theorem (CLT)
  • Many natural/aggregated measures are approx. normal (sums/averages of many small independent effects).
  • CLT preview: means of many independent rv’s tend toward normal distribution irrespective of parent.
Anatomy of a Normal Curve
  • Parameters: mean μ\mu (centre), standard deviation σ\sigma (spread).
  • Formula (not required to memorise): f(x)=1σ2πexp((xμ)22σ2)f(x)=\dfrac{1}{\sigma\sqrt{2\pi}}\exp\bigl( -\tfrac{(x-\mu)^2}{2\sigma^{2}} \bigr).
  • Notation (course uses SD): XN(μ,σ)X\sim N(\mu,\sigma); many texts use variance N(μ,σ2)N(\mu,\sigma^{2}).
Normality Assumption & Assessment
  • Assess normality before using normal-based methods:
    • Histograms (quick but bin-width sensitive).
    • Normal Quantile (Q-Q) plot (best): plot ordered data vs theoretical quantiles qi=Φ1(in+1)q_i=\Phi^{-1}\left(\tfrac{i}{n+1}\right).
    • If data come from normal, points≈straight line.
    • Systematic curves indicate skewness or heavy tails.
  • Boxplots nearly useless for normality.
  • Use a scale (excellent → hopeless) rather than yes/no.
68–95–99.7 Empirical Rule
  • For normal rv XN(μ,σ)X\sim N(\mu,\sigma):
    • P(|X-\mu|<\sigma)\approx0.68.
    • P(|X-\mu|<2\sigma)\approx0.95.
    • P(|X-\mu|<3\sigma)\approx0.997.
  • IQ example: N(100,15)N(100,15). 68 % between 85 & 115; 95 % between 70 & 130; 99.7 % between 55 & 145.
Other Continuous Models Introduced
  • Exponential Expo(λ)Expo(\lambda): waiting time to first success; mean 1/λ1/\lambda; memoryless.
  • Weibull: generalises exponential by allowing hazard rate tk1\propto t^{k-1}.
  • Gamma Gamma(a,λ)Gamma(a,\lambda): sum of aa i.i.d. exponential waits; mean a/λa/\lambda.

Lecture 3 – Probabilities & Quantiles from a Normal Distribution

Standard Normal ZZ
  • Defined as N(0,1)N(0,1); key for all probability work.
  • Use R:
    • P(Z<z) ⇒ pnorm(z) (cumulative left-tail).
    • Reverse: find cc s.t. P(Z<c)=p ⇒ qnorm(p).
Example Conversions
  • P(Z<1.4)=0.9192.
  • Two-sided interval: P(-1.39<Z<0.43)=pnorm(0.43)-pnorm(-1.39)=0.5841.
  • Right tail: P(Z>1.4)=1-pnorm(1.4)=0.0807 (or lower.tail=FALSE).
Standardising Any Normal XN(μ,σ)X\sim N(\mu,\sigma)
  • Transform Z=XμσZ=\dfrac{X-\mu}{\sigma}ZN(0,1)Z\sim N(0,1).
  • Probability statement converts accordingly.
  • IQ example revisited: P(X>110)=P(Z>\tfrac{10}{15})\approx0.2525.
  • Birth-weight example: XN(3500,600)X\sim N(3500,600); P(2000<X<3000)=P(-2.5<Z<-0.833)=0.196.
Using R Arguments vs Standardisation
  • Direct: pnorm(upper,mean,sd)-pnorm(lower,mean,sd).
  • Via Z: transform bounds then use default pnorm.
Additional Practice
  • Pregnancy length (266 ± 16 days) with 68–95–99.7 rule.
  • Find z-values for given cumulative/tail probabilities using qnorm.
Inside the Normal Quantile Plot (mechanics)
  • For sample size nn, theoretical quantiles: qi=Φ1(in+1)q_i=\Phi^{-1}\bigl(\tfrac{i}{n+1}\bigr).
  • Plot pairs (q<em>i,x</em>(i))(q<em>i,x</em>{(i)}); linearity ↔ normality.

Lecture 4 – Simulation Examples

Pseudo-Random Number Generation
  • Computers produce pseudo-random numbers via deterministic algorithms + seed.
  • Linear Congruential Generator (illustrated):
    • Modulus m=2311m=2^{31}-1; multiplier 48271.
    • Recurrence x<em>j+1=(48271x</em>j)  mod  mx<em>{j+1}=(48271 x</em>j)\;\text{mod}\; m ⇒ scale to [0,1][0,1].
  • Implemented as my.runif(); compared with R’s runif() (faster, higher quality).
From Uniform to Normal – Box–Muller
  • If u<em>1,u</em>2Unif[0,1]u<em>1,u</em>2\sim Unif[0,1] independent:
    • z<em>1=2lnu</em>1cos(2πu2)z<em>1=\sqrt{-2\ln u</em>1}\cos(2\pi u_2),
    • z<em>2=2lnu</em>1sin(2πu2)z<em>2=\sqrt{-2\ln u</em>1}\sin(2\pi u_2) ⇒ i.i.d. N(0,1)N(0,1).
  • Encapsulated in my.rnorm(); validated by large-sample mean ≈ 0, variance ≈ 1 and histogram overlapped with dnorm.
R Function Naming Conventions
  • rname → random generation; dname → density; pname → CDF; qname → quantile.
  • Examples: runif, rnorm, rchisq, rt, rf.
  • Extra distributions accessible via package PoweR (40+ laws).
Simulation for Unknown/Complex Distributions
  • Core idea: if model can be simulated, any quantity (mean, SD, prob.) approximated by Monte Carlo:
    • Generate large MM.
    • Estimate E[g(X)]1M<em>i=1Mg(x</em>i)E[g(X)]\approx\tfrac1M\sum<em>{i=1}^{M}g(x</em>i).
    • Probability P(A)=E[1A(X)]mean(indicator)P(A)=E[1_A(X)]\approx\text{mean}(\text{indicator}).
  • Demonstration:
    • If XN(2,7)X\sim N(2,7), let Y=X2Y=X^2. Monte Carlo with 100k draws yielded E(Y)53E(Y)\approx53, SD(Y)75SD(Y)\approx75, P(Y>5)\approx0.759.
Ant-Colony Case Study (interaction probability)
  • Model: positions C<em>1:(X</em>1,Y<em>1)N(0,30)×N(0,30)C<em>1:(X</em>{1},Y<em>{1})\sim N(0,30)\times N(0,30); C</em>2:(X<em>2,Y</em>2)N(100,30)×N(100,30)C</em>2:(X<em>{2},Y</em>{2})\sim N(100,30)\times N(100,30); independence.
  • Distance between random ants: D=(X<em>1X</em>2)2+(Y<em>1Y</em>2)2D=\sqrt{(X<em>1-X</em>2)^2+(Y<em>1-Y</em>2)^2}.
  • Simulate M=105M=10^5 pairs: mean(D<=70)=0.0291 ⇒ ≈2.9 % of pairs within 70 m (half-way) → “very little interaction”.
Success Criteria Recap
  • Able to:
    • Recognise continuous vs discrete.
    • Interpret & sketch density curves.
    • Apply normality assumption checks (histogram, Q-Q).
    • Use pnorm, qnorm for probability & quantile tasks.
    • Standardise vs supply (mean,sd).
    • Generate pseudo-random numbers & use Monte Carlo for arbitrary probs.

Ethical, Philosophical & Practical Implications

  • Modelling choice (continuous vs discrete) often pragmatic; must be justified by analysis goal and measurement precision.
  • Over-reliance on normality dangerous; always assess – “statistics is not exact, think critically rather than follow recipes”.
  • Simulation allows risk-free exploration (e.g. nuclear test modelling) but quality hinges on generator quality and valid underlying assumptions.

Key R Syntax Cheat-Sheet

  • pnorm(z,mean=μ,sd=σ,lower.tail=FALSE) → P(X>z).
  • qnorm(p)z<em>pz<em>p such that P(Z<z</em>p)=pP(Z<z</em>p)=p.
  • runif(n,min,max) | rnorm(n,mean,sd) → generate.
  • Probability via simulation: mean(expr) where expr is logical vector.
  • Histogram with density: hist(x,prob=TRUE); curve(dname(...),add=TRUE).
  • Q-Q plot: qqnorm(x); qqline(x).

Terminology (Keywords Slide)

  • probability model; density curve/function; normal curve; z-score; seed; pseudo-random number; quantile; lower.tail; Monte Carlo; memoryless property.