Lecture 4 Continuous Random Variables & Normal Distribution – Comprehensive Lecture Notes
Course Orientation
- MATH1041 Statistics for Life and Social Sciences – Chapter 4 notes (Term 2 2025)
- Overarching aim: introduce statistics as the science of collecting, analysing and interpreting data.
- Chapter 4 focus: Continuous Random Variables & the Normal Distribution.
- Four linked lectures:
- L1 Continuous random variables & density curves.
- L2 Normal distributions.
- L3 Probabilities & quantiles from a normal distribution.
- L4 Simulation examples.
- Textbook alignment: Moore et al. (2021) Section 4.3 & Section 1.4.
Quick Revision of Discrete Random Variables (context from previous chapter)
- Discrete rv’s: take countable values (finite or countably infinite list).
- Binomial model recap (independent Bernoulli trials):
- P(X=x)=\binom{n}{x}p^{x}(1-p)^{n-x},\;x=0,1,\dots,n
- Mean \muX=np, variance \sigmaX^{2}=np(1-p).
- Rules for linear combinations (any r.v.’s):
- E(a+bX)=a+bE(X)\,,\qquad Var(a+bX)=b^{2}Var(X).
- If independent, Var(X\pm Y)=Var(X)+Var(Y) (NOT true for SD’s nor dependent RV’s).
- Clarification of “mean” terminology:
- True (population) mean \muX; sample mean \bar X (a rv); observed mean \bar xn (a number).
Lecture 1 – Continuous Random Variables & Density Curves
Definitions & Conceptual Bases
- Uncountable set: cannot be put in a list (e.g. all reals in [0,1]).
- Continuous rv: can take every value in an interval ⇒ range uncountable.
- Practical criterion: before observing next value we cannot restrict it to a countable set.
- Real-world recognition exercise (C vs D): rainfall, temperature, turtle weight → continuous; houses sold, football score → discrete; eye colour → categorical (not numeric rv).
- Philosophical note: even height in a finite population is technically discrete but modelled as continuous for convenience.
Illustrative ‘truly continuous’ experiment
- Roll a perfect sphere (diameter 2) with a black dot; observe height H\in[0,2] when ball stops. Every value plausible ⇒ H continuous.
Density Curves & Density Functions
- Discrete pmf p(\cdot) ↔ continuous pdf f(\cdot).
- Properties of a valid density:
- f(x)\ge 0\,\forall x (no negative density).
- Total area \int_{-\infty}^{\infty}f(x)dx=1.
- Probability via area: P(a<X<b)=\int_{a}^{b}f(x)dx (integration not examinable; use geometry/R numerics).
- Histogram ↔ smoothed density (kernel density example: UNSW travel times).
Examples & Exercises
- Uniform[0,2] numbers:
- Density height =\frac12 (rectangle), area = probability.
- Results: P(0\le X\le1)=0.5, P(X=1/4)=0, general P(a\le X\le b)=\dfrac{b-a}{2}.
- Baby‐smile times assumed Unif[0,23]: probability between 2 & 18 = (18-2)/23; conditional P(X\ge12\mid X\ge8)=\frac{11}{15}.
Mean & Variance for Continuous rv’s (concept only)
- E(X)=\int x f(x)dx, Var(X)=\int (x-\mu)^2 f(x)dx – parallels discrete sums.
- Rules for means/variances identical to discrete case.
Lecture 2 – Normal Distributions
Motivation & Central Limit Theorem (CLT)
- Many natural/aggregated measures are approx. normal (sums/averages of many small independent effects).
- CLT preview: means of many independent rv’s tend toward normal distribution irrespective of parent.
Anatomy of a Normal Curve
- Parameters: mean \mu (centre), standard deviation \sigma (spread).
- Formula (not required to memorise): f(x)=\dfrac{1}{\sigma\sqrt{2\pi}}\exp\bigl( -\tfrac{(x-\mu)^2}{2\sigma^{2}} \bigr).
- Notation (course uses SD): X\sim N(\mu,\sigma); many texts use variance N(\mu,\sigma^{2}).
Normality Assumption & Assessment
- Assess normality before using normal-based methods:
- Histograms (quick but bin-width sensitive).
- Normal Quantile (Q-Q) plot (best): plot ordered data vs theoretical quantiles q_i=\Phi^{-1}\left(\tfrac{i}{n+1}\right).
- If data come from normal, points≈straight line.
- Systematic curves indicate skewness or heavy tails.
- Boxplots nearly useless for normality.
- Use a scale (excellent → hopeless) rather than yes/no.
68–95–99.7 Empirical Rule
- For normal rv X\sim N(\mu,\sigma):
- P(|X-\mu|<\sigma)\approx0.68.
- P(|X-\mu|<2\sigma)\approx0.95.
- P(|X-\mu|<3\sigma)\approx0.997.
- IQ example: N(100,15). 68 % between 85 & 115; 95 % between 70 & 130; 99.7 % between 55 & 145.
Other Continuous Models Introduced
- Exponential Expo(\lambda): waiting time to first success; mean 1/\lambda; memoryless.
- Weibull: generalises exponential by allowing hazard rate \propto t^{k-1}.
- Gamma Gamma(a,\lambda): sum of a i.i.d. exponential waits; mean a/\lambda.
Lecture 3 – Probabilities & Quantiles from a Normal Distribution
Standard Normal Z
- Defined as N(0,1); key for all probability work.
- Use R:
- P(Z<z) ⇒
pnorm(z) (cumulative left-tail). - Reverse: find c s.t. P(Z<c)=p ⇒
qnorm(p).
Example Conversions
- P(Z<1.4)=0.9192.
- Two-sided interval: P(-1.39<Z<0.43)=pnorm(0.43)-pnorm(-1.39)=0.5841.
- Right tail: P(Z>1.4)=1-pnorm(1.4)=0.0807 (or
lower.tail=FALSE).
Standardising Any Normal X\sim N(\mu,\sigma)
- Transform Z=\dfrac{X-\mu}{\sigma} ⇒ Z\sim N(0,1).
- Probability statement converts accordingly.
- IQ example revisited: P(X>110)=P(Z>\tfrac{10}{15})\approx0.2525.
- Birth-weight example: X\sim N(3500,600); P(2000<X<3000)=P(-2.5<Z<-0.833)=0.196.
Using R Arguments vs Standardisation
- Direct:
pnorm(upper,mean,sd)-pnorm(lower,mean,sd). - Via Z: transform bounds then use default
pnorm.
Additional Practice
- Pregnancy length (266 ± 16 days) with 68–95–99.7 rule.
- Find z-values for given cumulative/tail probabilities using
qnorm.
Inside the Normal Quantile Plot (mechanics)
- For sample size n, theoretical quantiles: q_i=\Phi^{-1}\bigl(\tfrac{i}{n+1}\bigr).
- Plot pairs (qi,x{(i)}); linearity ↔ normality.
Lecture 4 – Simulation Examples
Pseudo-Random Number Generation
- Computers produce pseudo-random numbers via deterministic algorithms + seed.
- Linear Congruential Generator (illustrated):
- Modulus m=2^{31}-1; multiplier 48271.
- Recurrence x{j+1}=(48271 xj)\;\text{mod}\; m ⇒ scale to [0,1].
- Implemented as
my.runif(); compared with R’s runif() (faster, higher quality).
- If u1,u2\sim Unif[0,1] independent:
- z1=\sqrt{-2\ln u1}\cos(2\pi u_2),
- z2=\sqrt{-2\ln u1}\sin(2\pi u_2) ⇒ i.i.d. N(0,1).
- Encapsulated in
my.rnorm(); validated by large-sample mean ≈ 0, variance ≈ 1 and histogram overlapped with dnorm.
R Function Naming Conventions
rname → random generation; dname → density; pname → CDF; qname → quantile.- Examples:
runif, rnorm, rchisq, rt, rf. - Extra distributions accessible via package
PoweR (40+ laws).
Simulation for Unknown/Complex Distributions
- Core idea: if model can be simulated, any quantity (mean, SD, prob.) approximated by Monte Carlo:
- Generate large M.
- Estimate E[g(X)]\approx\tfrac1M\sum{i=1}^{M}g(xi).
- Probability P(A)=E[1_A(X)]\approx\text{mean}(\text{indicator}).
- Demonstration:
- If X\sim N(2,7), let Y=X^2. Monte Carlo with 100k draws yielded E(Y)\approx53, SD(Y)\approx75, P(Y>5)\approx0.759.
Ant-Colony Case Study (interaction probability)
- Model: positions C1:(X{1},Y{1})\sim N(0,30)\times N(0,30); C2:(X{2},Y{2})\sim N(100,30)\times N(100,30); independence.
- Distance between random ants: D=\sqrt{(X1-X2)^2+(Y1-Y2)^2}.
- Simulate M=10^5 pairs:
mean(D<=70)=0.0291 ⇒ ≈2.9 % of pairs within 70 m (half-way) → “very little interaction”.
Success Criteria Recap
- Able to:
- Recognise continuous vs discrete.
- Interpret & sketch density curves.
- Apply normality assumption checks (histogram, Q-Q).
- Use
pnorm, qnorm for probability & quantile tasks. - Standardise vs supply
(mean,sd). - Generate pseudo-random numbers & use Monte Carlo for arbitrary probs.
Ethical, Philosophical & Practical Implications
- Modelling choice (continuous vs discrete) often pragmatic; must be justified by analysis goal and measurement precision.
- Over-reliance on normality dangerous; always assess – “statistics is not exact, think critically rather than follow recipes”.
- Simulation allows risk-free exploration (e.g. nuclear test modelling) but quality hinges on generator quality and valid underlying assumptions.
Key R Syntax Cheat-Sheet
pnorm(z,mean=μ,sd=σ,lower.tail=FALSE) → P(X>z).qnorm(p) → zp such that P(Zrunif(n,min,max) | rnorm(n,mean,sd) → generate.- Probability via simulation:
mean(expr) where expr is logical vector. - Histogram with density:
hist(x,prob=TRUE); curve(dname(...),add=TRUE). - Q-Q plot:
qqnorm(x); qqline(x).
Terminology (Keywords Slide)
- probability model; density curve/function; normal curve; z-score; seed; pseudo-random number; quantile; lower.tail; Monte Carlo; memoryless property.