Notes: Pearson Edexcel International A Level Statistics 1 — Comprehensive Study Notes
Chapter 1 — Mathematical Modelling
What is a mathematical model?
A simplification of a real-world situation used to make predictions and forecasts.
Aims to include main features of the real-world problem; may rely on assumptions.
Benefits: quick/easy to produce, cost-effective, enables predictions, helps understand the world, shows how changes in variables affect outcomes, simplifies complex situations.
Drawbacks: simplifications can cause errors, models may only work under certain conditions.
Real-world example framework:
Imagine scientists studying leopard populations in Sri Lanka over years; instead of counting every leopard, a model can be built to study trends and make forecasts.
The seven-stage modelling process (designing a model):
Stage 1: The recognition of a real-world problem
Stage 2: A mathematical model is devised
Stage 3: Model is used to make predictions about the real-world problem
Stage 4: Experimental data are collected from the real world
Stage 5: Comparisons are made against the devised model
Stage 6: Statistical concepts are used to test how well the model describes the real-world problem
Stage 7: Model is refined
If the predicted values differ from observed values, iterate stages 2–6 to refine the model.
Design considerations (readable, tractable models):
Assumptions are necessary to manage the model's complexity.
Assumptions should be acknowledged when analysing results.
Chapters emphasize three intertwined themes:
Mathematical argument, language and proof
Mathematical problem-solving (problem-solving cycle)
Transferable skills (data handling, communication, etc.)
Key takeaways from this chapter:
Models are powerful but imperfect; they are tools, not exact replicas of reality.
A model is judged by how well it predicts and how useful it is for understanding and decision-making.
Chapter 1 — Chapter Summary Points (Key Concepts)
Definition of a mathematical model:
M = ext{simplification of a real-world situation}
Used for predictions, forecasts and understanding.
Advantages of modelling:
Quick/cheap to produce; enables predictions; helps understand effects of changing variables.
Disadvantages of modelling:
Oversimplification can cause errors; models may only be valid under certain conditions.
Modelling stages (as above) and the iterative nature of model refinement.
Real-world relevance: models inform decisions in science, engineering, economics, climate studies, etc.
Ethical and practical implications:
Models influence policy and resource allocation; mis-specification can lead to poor decisions; transparency about assumptions is essential.
Chapter 2 — Measures of Location and Spread
Learning objectives:
Recognise data types (qualitative vs quantitative; discrete vs continuous).
Compute measures of central tendency: mean, median, mode.
Compute measures of location: quartiles, percentiles.
Compute measures of spread: range, interquartile range (IQR), interpercentile range; variance and standard deviation.
Understand data coding and its impact on statistics.
Types of data (with quick typology):
Qualitative (non-numerical) vs Quantitative (numerical)
Discrete data: takes specific values (e.g., counts)
Continuous data: can take any value within a range (e.g., measurements)
Data representation in grouped form:
Large data sets can be presented as frequency tables or grouped data with classes.
Class boundaries, class width, and midpoints are used to work with grouped data.
Measures of central tendency (basic formulas):
Mean (ungrouped): ar{x} = rac{ extstyle \sum x}{n}
For data in a frequency table: ar{x} = rac{ extstyle extstyle extstyle \sum f x}{ extstyle extstyle \sum f}
Median and mode definitions (median = middle value; mode = most frequent value or modal class for grouped data).
When combining data sets: ar{x} = rac{n1 ar{x}1 + n2 ar{x}2}{n1 + n2}
Measures of spread and location in data:
Range: largest − smallest value.
Interquartile range (IQR): ext{IQR} = Q3 - Q1
Interpercentile range: difference between two percentiles (e.g., P10 to P90).
Variance and standard deviation definitions (population forms):
ext{Variance} = rac{ extstyle ext{S}_{xx}}{n} = rac{ extstyle ext{Σ}(x - ar{x})^2}{n} = rac{ ext{Σ}x^2}{n} - ar{x}^2
For grouped data: use class midpoints and frequencies to approximate
Standard deviation: ext{SD} =
rac{ ext{Σ}f(x - ar{x})^2}{ ext{Σ}f}
ight)^{1/2}
Coding data (Section 2.6):
Coding transformation: y = rac{x - a}{b}
Mean of coded data: ar{y} = rac{ar{x} - a}{b}
Standard deviation of coded data: sy = rac{sx}{|b|}
Uncoding: x = by + a
Class boundaries and interpolation with grouped data:
When class intervals are contiguous (no gaps): class boundaries are the endpoints.
When gaps exist: use halfway points as class boundaries (e.g., 55-65 becomes 54.5 to 65.5).
Box plots and stem-and-leaf diagrams (preview for Chapter 3):
Box plots summarize quartiles, min/max, and possible outliers; stem-and-leaf shows distribution shape and quartiles.
Practical exercises and problem-solving cues:
Exercise questions progress in difficulty and include “interpolation” and “interquartile” concepts.
Chapter 2 — Key Formulas and Concepts (LaTeX versions)
Mean (ungrouped):
ar{x} = rac{ extstyle
p extstyle
}{n}Mean (grouped):
ar{x} = rac{ extstyle extstyle extstyle \sum f x}{ extstyle extstyle \\sum f}Range, IQR, Interpercentile range:
Range: R = x{ ext{max}} - x{ ext{min}}
IQR: ext{IQR} = Q3 - Q1
Interpercentile range: ext{IPR}{p,q} = Qq - Q_p
Variance and standard deviation (population):
ext{Var}(X) = E(X^2) - [E(X)]^2 = rac{ ext{Σ}x^2}{n} - ar{X}^2
ext{SD}(X) =
ms ext{Var}(X)
Z-score (coding for standard normal):
Standardization: Z = rac{X - ar{x}}{ ext{SD}}
For coding with a transformation, the effects follow the same linear rules as above.
Class boundaries and midpoints (grouped data):
Class midpoint: mi = rac{Li + U_i}{2}
Class width: wi = Ui - L_i
If class boundaries are not integers, use correct boundary values (e.g., 54.5, 65.5).
Chapter 3 — Representations of Data
Histograms (grouped continuous data):
Area of a bar is proportional to frequency; height (frequency density) is defined by
ext{Frequency density} = rac{ ext{frequency}}{ ext{class width}}When width varies, use area to compare frequencies.
A frequency polygon connects midpoints of histogram bars.
Box plots and outliers (definition):
An outlier is often identified as values outside Q1 - 1.5 imes ext{IQR} or Q3 + 1.5 imes ext{IQR} (one common rule; the exam may specify a different k).
Box plot shows min, Q1, median, Q3, max, and optional outliers.
Skewness indicators:
Box plot shape and the relative order of quartiles (
Q1, Q2, Q3) reveal skewness.A simple numeric skew index can be formed from quartiles (examples in exercises).
Stem-and-leaf diagrams:
A graphical method that preserves raw data and shows distribution shape; used to read quartiles directly.
Back-to-back stem-and-leaf diagrams compare two data sets.
Outlier handling and data cleaning:
Distinguish between genuine extreme values and errors/anomalies; justify removal with context.
Chapter 3 — Chapter Summary (Key Rules)
Mode, median, mean for data types:
Mode for categorical data or bimodal data; median for quantitative data; mean for quantitative data when no extreme values distort it.
For discrete data in a frequency table:
Mean: ar{x} = rac{ extstyle ext{Σ} f x}{ ext{Σ} f}
Median and quartiles can be found from the cumulative frequencies.
Interpolation for grouped data median/quartiles:
With grouped data, interpolate within the class containing the desired percentile.
Skewness and interpretation:
Skewness can be described qualitatively via box plots and quartile relationships.
Chapter 4 — Probability
Core vocabulary:
Experiment, event, sample space, probability, independent events, mutually exclusive events, conditional probability.
Venn diagrams and set notation:
Intersection: A ext{ and } B = A \cap B
Union: A ext{ or } B = A \cup B
Complement: A' = ext{not } A
n(A), n(B) denote counts in sets A and B.
Basic probability rules:
For mutually exclusive events: P(A B) = P(A) + P(B)
For any events: P(A B) = P(A) + P(B) - P(A B)
Conditional probability: P(B|A) = rac{P(A B)}{P(A)}; independent if P(B|A) = P(B).
Tree diagrams:
Useful for sequential events; probabilities multiply along branches.
Two-way tables (contingency tables):
Organized counts to compute marginals, conditionals, and independence.
Common problems include: calculating P(A), P(B), P(A|B), P(B|A'), etc.
Chapter 4 — Chapter Summary (Key Formulas)
Union/Intersection: P(A B) = P(A) + P(B) - P(A B)
Conditional probability: P(B|A) = rac{P(A B)}{P(A)}
Independence test through multiplication: if independent, P(A B) = P(A)P(B) and equivalently, P(B|A) = P(B).
Tree diagrams and Bayes-style reasoning are used to compute conditional probabilities.
Chapter 5 — Correlation and Regression
Bivariate data and scatter diagrams:
Use to determine if a linear relationship exists between two variables (explanatory x and response y).
Positive vs negative vs no correlation; linearity is assumed for regression.
Product moment correlation coefficient (PMCC):
Define summary statistics: S_{xx} = \,\sum (x - \bar{x})^2 = \sum x^2 - (\sum x)^2/n
S_{yy} = \sum (y - \bar{y})^2 = \sum y^2 - (\sum y)^2/n
S_{xy} = \sum (x - \bar{x})(y - \bar{y}) = \sum xy - (\sum x)(\sum y)/n
PMCC: r = \frac{S{xy}}{\sqrt{S{xx} S_{yy}}}
Interpretation: r from -1 to 1; closer to ±1 means stronger linear relationship.
Linear regression line (least squares):
Regression line of y on x: y = a + b x
Gradient (slope): b = \frac{S{xy}}{S{xx}}
Intercept: a = \bar{y} - b \bar{x}
Use of coding to simplify regression: linear coding y = (x - a)/b; r is invariant to linear coding.
Interpolation vs extrapolation:
Interpolation within the data range is more reliable; extrapolation beyond the data range is less reliable.
Residuals and model checking:
Residual for a data point is the difference between observed and predicted value. A good model has residuals randomly scattered around 0.
Summary idea from the PMCC and regression:
A strong positive PMCC suggests a positive linear relationship and a reliable regression line (subject to data being appropriate for a linear model).
Chapter 5 — Chapter Summary (Key Formulas)
Regression line: y = a + b x, ext{ where } b = \frac{S{xy}}{S{xx}},\ a = \bar{y} - b \bar{x}
PMCC: r = \frac{S{xy}}{\sqrt{S{xx} S_{yy}}}
Sums in compact form (useful with raw or summarized data):
S_{xx} = \sum x^2 - \frac{(\sum x)^2}{n}
S_{xy} = \sum xy - \frac{(\sum x)(\sum y)}{n}
S_{yy} = \sum y^2 - \frac{(\sum y)^2}{n}
Expectations and residuals: explain connection between observed and predicted values; residuals should be randomly distributed about 0 for a good linear model.
Chapter 6 — Discrete Random Variables
Random variable basics:
Random variable X maps outcomes to numerical values; discrete means X takes a countable set of values.
Probability distribution of X: P(X=x) for each possible x; total probability sums to 1.
Expected value and variance for discrete X:
Expected value (mean): E(X) = \sum x P(X=x)
For a simple table: if values are xi with probabilities pi, then E(X) = \sumi xi p_i
Variance: \operatorname{Var}(X) = E(X^2) - [E(X)]^2,\qquad E(X^2) = \sum xi^2 P(X=xi)
Cumulative distribution function (CDF): F(x) = P(X \le x); obtained by summing probabilities up to x.
Discrete uniform distribution:
Values 1,2,…,n equally likely; E(X) = \frac{n+1}{2},\ \operatorname{Var}(X) = \frac{(n+1)(n-1)}{12}
Transformations (linear coding) and their effect on PMCC and means:
If you code X with Y = aX + b, then
E(Y) = aE(X) + b,
\operatorname{Var}(Y) = a^2 \operatorname{Var}(X)
Examples and techniques appear throughout exercises (e.g., using summary data with Sxx, Sxy, etc.).
Chapter 7 — The Normal Distribution
Normal distribution basics:
Continuous distribution with bell-shaped, symmetric curve; mean μ and variance σ^2.
Probability statements are about ranges, not exact values: e.g., P(a < X < b).
Key property: 68% within μ ± σ; 95% within μ ± 2σ; 99.7% within μ ± 3σ.
Standard normal distribution:
Z ~ N(0,1) with z-score: Z = \dfrac{X - \mu}{\sigma}
If X ~ N(μ, σ^2), then X = μ + σ Z.
Probability tables and z-values:
Use the standard normal table (Φ) to find P(Z < z) for given z, or to find z for a given probability.
Inverse normal: find z such that P(Z < z) = p.
Examples of standardization and probability calculation:
If X ~ N(μ, σ^2) and z = (x - μ)/σ, then P(X < x) = P(Z < z).
Use the table to determine probabilities or z-values; for probabilities outside the table, use 1 − Φ(z) and symmetry.
Solving mean/variance from percentiles:
Given percentile information, solve for μ and σ by using z-values from the standard normal table and equations like
\frac{x - μ}{σ} = z_p
Applications and problem-solving examples across the chapter illustrate:
Finding P(X > a), P(X < a), P(X ∈ [a,b]), and corresponding z-values.
Using inverse normal to determine μ and σ from percentile data.
Chapter 7 — Chapter Summary (Key Formulas)
Standardization: Z = \frac{X - μ}{σ},\, Z \sim N(0,1)
Percentile and z-value relationships via Φ(z):P(Zz) use 1 - Φ(z).
Within: P(μ-σ \le X \le μ+σ) ≈ 0.68;\ P(μ-2σ \le X \le μ+2σ) ≈ 0.95;\ P(μ-3σ \le X \le μ+3σ) ≈ 0.997.
Inverse normal calculations:
If P(Z < z) = p, find z from the standard normal table or inverse function.
Practical tasks include estimating μ and σ from percentile data, and using z-tables to evaluate probabilities for X ~ N(μ, σ^2).
General Notes: Formulae Quick Reference (LaTeX)
Mean (ungrouped): ar{x} = \frac{\sum x}{n}
Mean (grouped): ar{x} = \frac{\sum f x}{\sum f}
Class width and midpoints (grouped data):
w = U - L, midpoint m = \frac{L+U}{2}
Variance (population): \operatorname{Var}(X) = E(X^2) - [E(X)]^2 = \frac{\sum x^2}{n} - \bar{x}^2
Standard deviation: \operatorname{SD}(X) = \sqrt{\operatorname{Var}(X)}
For grouped data (variance):\operatorname{Var}(X) = \frac{\sum f (x-\bar{x})^2}{\sum f} \quad\text{(approx.)}
Coding (linear): if y = a x + b, then
E(Y) = a E(X) + b,\quad \operatorname{Var}(Y) = a^2 \operatorname{Var}(X)Regression line (y on x): y = a + b x,\quad b = \frac{S{xy}}{S{xx}},\ a = \bar{y} - b \bar{x}
PMCC: r = \frac{S{xy}}{\sqrt{S{xx} S_{yy}}}
Discrete expected value and variance:
E(X) = \sum x P(X=x),\quad \operatorname{Var}(X) = E(X^2) - [E(X)]^2Normal standardization: Z = \frac{X - μ}{σ},\ Z \sim N(0,1)
For a vertical bar of a histogram: area = frequency; height = density; density = freq / width.
Outliers (common rule): Q1 - 1.5 \cdot \, \text{IQR},\; Q3 + 1.5 \cdot \, \text{IQR}$$
Quick Visual Reference (What to Study for Exams)
Modelling: 7-stage cycle, advantages/disadvantages, and refinement.
Data types: qualitative vs quantitative; discrete vs continuous; when to use median vs mean; interpreting IQR and percentiles.
Graphical summaries: histograms, box plots, stem-and-leaf; how to read class boundaries and interpolate within classes.
Probability toolkit: Venn diagrams; set notation; independence; mutual exclusivity; conditional probability; tree diagrams.
Regression focus: how to compute regression line and r from summarized data; interpreting slope and intercept; checking residuals.
Normal distribution mastery: standardization, Φ-table usage, z-values, and real-data reasoning about when to apply Normal models.
If you'd like, I can tailor these notes to a specific chapter or particular subset of topics (e.g., just the Probability or just the Normal Distribution) and extend any formula derivations or worked examples from the transcript.