Lecture 4 Continuous Random Variables & Normal Distribution – Comprehensive Lecture Notes

Course Orientation

  • MATH1041 Statistics for Life and Social Sciences – Chapter 4 notes (Term 2 2025)
    • Overarching aim: introduce statistics as the science of collecting, analysing and interpreting data.
    • Chapter 4 focus: Continuous Random Variables & the Normal Distribution.
    • Four linked lectures:
    • L1 Continuous random variables & density curves.
    • L2 Normal distributions.
    • L3 Probabilities & quantiles from a normal distribution.
    • L4 Simulation examples.
    • Textbook alignment: Moore et al. (2021) Section 4.3 & Section 1.4.

Quick Revision of Discrete Random Variables (context from previous chapter)

  • Discrete rv’s: take countable values (finite or countably infinite list).
  • Binomial model recap (independent Bernoulli trials):
    • P(X=x)=\binom{n}{x}p^{x}(1-p)^{n-x},\;x=0,1,\dots,n
    • Mean \muX=np, variance \sigmaX^{2}=np(1-p).
  • Rules for linear combinations (any r.v.’s):
    • E(a+bX)=a+bE(X)\,,\qquad Var(a+bX)=b^{2}Var(X).
    • If independent, Var(X\pm Y)=Var(X)+Var(Y) (NOT true for SD’s nor dependent RV’s).
  • Clarification of “mean” terminology:
    • True (population) mean \muX; sample mean \bar X (a rv); observed mean \bar xn (a number).

Lecture 1 – Continuous Random Variables & Density Curves

Definitions & Conceptual Bases

  • Uncountable set: cannot be put in a list (e.g. all reals in [0,1]).
  • Continuous rv: can take every value in an interval ⇒ range uncountable.
  • Practical criterion: before observing next value we cannot restrict it to a countable set.
  • Real-world recognition exercise (C vs D): rainfall, temperature, turtle weight → continuous; houses sold, football score → discrete; eye colour → categorical (not numeric rv).
  • Philosophical note: even height in a finite population is technically discrete but modelled as continuous for convenience.

Illustrative ‘truly continuous’ experiment

  • Roll a perfect sphere (diameter 2) with a black dot; observe height H\in[0,2] when ball stops. Every value plausible ⇒ H continuous.

Density Curves & Density Functions

  • Discrete pmf p(\cdot) ↔ continuous pdf f(\cdot).
  • Properties of a valid density:
    • f(x)\ge 0\,\forall x (no negative density).
    • Total area \int_{-\infty}^{\infty}f(x)dx=1.
  • Probability via area: P(a<X<b)=\int_{a}^{b}f(x)dx (integration not examinable; use geometry/R numerics).
  • Histogram ↔ smoothed density (kernel density example: UNSW travel times).

Examples & Exercises

  • Uniform[0,2] numbers:
    • Density height =\frac12 (rectangle), area = probability.
    • Results: P(0\le X\le1)=0.5, P(X=1/4)=0, general P(a\le X\le b)=\dfrac{b-a}{2}.
  • Baby‐smile times assumed Unif[0,23]: probability between 2 & 18 = (18-2)/23; conditional P(X\ge12\mid X\ge8)=\frac{11}{15}.

Mean & Variance for Continuous rv’s (concept only)

  • E(X)=\int x f(x)dx, Var(X)=\int (x-\mu)^2 f(x)dx – parallels discrete sums.
  • Rules for means/variances identical to discrete case.

Lecture 2 – Normal Distributions

Motivation & Central Limit Theorem (CLT)

  • Many natural/aggregated measures are approx. normal (sums/averages of many small independent effects).
  • CLT preview: means of many independent rv’s tend toward normal distribution irrespective of parent.

Anatomy of a Normal Curve

  • Parameters: mean \mu (centre), standard deviation \sigma (spread).
  • Formula (not required to memorise): f(x)=\dfrac{1}{\sigma\sqrt{2\pi}}\exp\bigl( -\tfrac{(x-\mu)^2}{2\sigma^{2}} \bigr).
  • Notation (course uses SD): X\sim N(\mu,\sigma); many texts use variance N(\mu,\sigma^{2}).

Normality Assumption & Assessment

  • Assess normality before using normal-based methods:
    • Histograms (quick but bin-width sensitive).
    • Normal Quantile (Q-Q) plot (best): plot ordered data vs theoretical quantiles q_i=\Phi^{-1}\left(\tfrac{i}{n+1}\right).
    • If data come from normal, points≈straight line.
    • Systematic curves indicate skewness or heavy tails.
  • Boxplots nearly useless for normality.
  • Use a scale (excellent → hopeless) rather than yes/no.

68–95–99.7 Empirical Rule

  • For normal rv X\sim N(\mu,\sigma):
    • P(|X-\mu|<\sigma)\approx0.68.
    • P(|X-\mu|<2\sigma)\approx0.95.
    • P(|X-\mu|<3\sigma)\approx0.997.
  • IQ example: N(100,15). 68 % between 85 & 115; 95 % between 70 & 130; 99.7 % between 55 & 145.

Other Continuous Models Introduced

  • Exponential Expo(\lambda): waiting time to first success; mean 1/\lambda; memoryless.
  • Weibull: generalises exponential by allowing hazard rate \propto t^{k-1}.
  • Gamma Gamma(a,\lambda): sum of a i.i.d. exponential waits; mean a/\lambda.

Lecture 3 – Probabilities & Quantiles from a Normal Distribution

Standard Normal Z

  • Defined as N(0,1); key for all probability work.
  • Use R:
    • P(Z<z) ⇒ pnorm(z) (cumulative left-tail).
    • Reverse: find c s.t. P(Z<c)=p ⇒ qnorm(p).

Example Conversions

  • P(Z<1.4)=0.9192.
  • Two-sided interval: P(-1.39<Z<0.43)=pnorm(0.43)-pnorm(-1.39)=0.5841.
  • Right tail: P(Z>1.4)=1-pnorm(1.4)=0.0807 (or lower.tail=FALSE).

Standardising Any Normal X\sim N(\mu,\sigma)

  • Transform Z=\dfrac{X-\mu}{\sigma} ⇒ Z\sim N(0,1).
  • Probability statement converts accordingly.
  • IQ example revisited: P(X>110)=P(Z>\tfrac{10}{15})\approx0.2525.
  • Birth-weight example: X\sim N(3500,600); P(2000<X<3000)=P(-2.5<Z<-0.833)=0.196.

Using R Arguments vs Standardisation

  • Direct: pnorm(upper,mean,sd)-pnorm(lower,mean,sd).
  • Via Z: transform bounds then use default pnorm.

Additional Practice

  • Pregnancy length (266 ± 16 days) with 68–95–99.7 rule.
  • Find z-values for given cumulative/tail probabilities using qnorm.

Inside the Normal Quantile Plot (mechanics)

  • For sample size n, theoretical quantiles: q_i=\Phi^{-1}\bigl(\tfrac{i}{n+1}\bigr).
  • Plot pairs (qi,x{(i)}); linearity ↔ normality.

Lecture 4 – Simulation Examples

Pseudo-Random Number Generation

  • Computers produce pseudo-random numbers via deterministic algorithms + seed.
  • Linear Congruential Generator (illustrated):
    • Modulus m=2^{31}-1; multiplier 48271.
    • Recurrence x{j+1}=(48271 xj)\;\text{mod}\; m ⇒ scale to [0,1].
  • Implemented as my.runif(); compared with R’s runif() (faster, higher quality).

From Uniform to Normal – Box–Muller

  • If u1,u2\sim Unif[0,1] independent:
    • z1=\sqrt{-2\ln u1}\cos(2\pi u_2),
    • z2=\sqrt{-2\ln u1}\sin(2\pi u_2) ⇒ i.i.d. N(0,1).
  • Encapsulated in my.rnorm(); validated by large-sample mean ≈ 0, variance ≈ 1 and histogram overlapped with dnorm.

R Function Naming Conventions

  • rname → random generation; dname → density; pname → CDF; qname → quantile.
  • Examples: runif, rnorm, rchisq, rt, rf.
  • Extra distributions accessible via package PoweR (40+ laws).

Simulation for Unknown/Complex Distributions

  • Core idea: if model can be simulated, any quantity (mean, SD, prob.) approximated by Monte Carlo:
    • Generate large M.
    • Estimate E[g(X)]\approx\tfrac1M\sum{i=1}^{M}g(xi).
    • Probability P(A)=E[1_A(X)]\approx\text{mean}(\text{indicator}).
  • Demonstration:
    • If X\sim N(2,7), let Y=X^2. Monte Carlo with 100k draws yielded E(Y)\approx53, SD(Y)\approx75, P(Y>5)\approx0.759.

Ant-Colony Case Study (interaction probability)

  • Model: positions C1:(X{1},Y{1})\sim N(0,30)\times N(0,30); C2:(X{2},Y{2})\sim N(100,30)\times N(100,30); independence.
  • Distance between random ants: D=\sqrt{(X1-X2)^2+(Y1-Y2)^2}.
  • Simulate M=10^5 pairs: mean(D<=70)=0.0291 ⇒ ≈2.9 % of pairs within 70 m (half-way) → “very little interaction”.

Success Criteria Recap

  • Able to:
    • Recognise continuous vs discrete.
    • Interpret & sketch density curves.
    • Apply normality assumption checks (histogram, Q-Q).
    • Use pnorm, qnorm for probability & quantile tasks.
    • Standardise vs supply (mean,sd).
    • Generate pseudo-random numbers & use Monte Carlo for arbitrary probs.

Ethical, Philosophical & Practical Implications

  • Modelling choice (continuous vs discrete) often pragmatic; must be justified by analysis goal and measurement precision.
  • Over-reliance on normality dangerous; always assess – “statistics is not exact, think critically rather than follow recipes”.
  • Simulation allows risk-free exploration (e.g. nuclear test modelling) but quality hinges on generator quality and valid underlying assumptions.

Key R Syntax Cheat-Sheet

  • pnorm(z,mean=μ,sd=σ,lower.tail=FALSE) → P(X>z).
  • qnorm(p) → zp such that P(Z
  • runif(n,min,max) | rnorm(n,mean,sd) → generate.
  • Probability via simulation: mean(expr) where expr is logical vector.
  • Histogram with density: hist(x,prob=TRUE); curve(dname(...),add=TRUE).
  • Q-Q plot: qqnorm(x); qqline(x).

Terminology (Keywords Slide)

  • probability model; density curve/function; normal curve; z-score; seed; pseudo-random number; quantile; lower.tail; Monte Carlo; memoryless property.