Notes: Pearson Edexcel International A Level Statistics 1 — Comprehensive Study Notes

Chapter 1 — Mathematical Modelling

What is a mathematical model?
- A simplification of a real-world situation used to make predictions and forecasts.
- Aims to include main features of the real-world problem; may rely on assumptions.
- Benefits: quick/easy to produce, cost-effective, enables predictions, helps understand the world, shows how changes in variables affect outcomes, simplifies complex situations.
- Drawbacks: simplifications can cause errors, models may only work under certain conditions.
Real-world example framework:
- Imagine scientists studying leopard populations in Sri Lanka over years; instead of counting every leopard, a model can be built to study trends and make forecasts.
The seven-stage modelling process (designing a model):
- Stage 1: The recognition of a real-world problem
- Stage 2: A mathematical model is devised
- Stage 3: Model is used to make predictions about the real-world problem
- Stage 4: Experimental data are collected from the real world
- Stage 5: Comparisons are made against the devised model
- Stage 6: Statistical concepts are used to test how well the model describes the real-world problem
- Stage 7: Model is refined
- If the predicted values differ from observed values, iterate stages 2–6 to refine the model.
Design considerations (readable, tractable models):
- Assumptions are necessary to manage the model's complexity.
- Assumptions should be acknowledged when analysing results.
Chapters emphasize three intertwined themes:
- Mathematical argument, language and proof
- Mathematical problem-solving (problem-solving cycle)
- Transferable skills (data handling, communication, etc.)
Key takeaways from this chapter:
- Models are powerful but imperfect; they are tools, not exact replicas of reality.
- A model is judged by how well it predicts and how useful it is for understanding and decision-making.

Chapter 1 — Chapter Summary Points (Key Concepts)

Definition of a mathematical model:
- $M = ext{simplification of a real-world situation}$
- Used for predictions, forecasts and understanding.
Advantages of modelling:
- Quick/cheap to produce; enables predictions; helps understand effects of changing variables.
Disadvantages of modelling:
- Oversimplification can cause errors; models may only be valid under certain conditions.
Modelling stages (as above) and the iterative nature of model refinement.
Real-world relevance: models inform decisions in science, engineering, economics, climate studies, etc.
Ethical and practical implications:
- Models influence policy and resource allocation; mis-specification can lead to poor decisions; transparency about assumptions is essential.

Chapter 2 — Measures of Location and Spread

Learning objectives:
- Recognise data types (qualitative vs quantitative; discrete vs continuous).
- Compute measures of central tendency: mean, median, mode.
- Compute measures of location: quartiles, percentiles.
- Compute measures of spread: range, interquartile range (IQR), interpercentile range; variance and standard deviation.
- Understand data coding and its impact on statistics.
Types of data (with quick typology):
- Qualitative (non-numerical) vs Quantitative (numerical)
- Discrete data: takes specific values (e.g., counts)
- Continuous data: can take any value within a range (e.g., measurements)
Data representation in grouped form:
- Large data sets can be presented as frequency tables or grouped data with classes.
- Class boundaries, class width, and midpoints are used to work with grouped data.
Measures of central tendency (basic formulas):
- Mean (ungrouped): $\bar{x} = rac{ extstyle \sum x}{n}$
- For data in a frequency table: $\bar{x} = rac{ extstyle extstyle extstyle \sum f x}{ extstyle extstyle \sum f}$
- Median and mode definitions (median = middle value; mode = most frequent value or modal class for grouped data).
- When combining data sets: $\bar{x} = rac{n1 \bar{x}1 + n2 \bar{x}2}{n1 + n2}$
Measures of spread and location in data:
- Range: largest − smallest value.
- Interquartile range (IQR): $ext{IQR} = Q3 - Q1$
- Interpercentile range: difference between two percentiles (e.g., P10 to P90).
- Variance and standard deviation definitions (population forms):
- $ext{Variance} = rac{ extstyle ext{S}_{xx}}{n} = rac{ extstyle ext{Σ}(x - \bar{x})^2}{n} = rac{ ext{Σ}x^2}{n} - \bar{x}^2$
- For grouped data: use class midpoints and frequencies to approximate
- Standard deviation: $ext{SD} = rac{ ext{Σ}f(x - \bar{x})^2}{ ext{Σ}f} ight)^{1/2}$
Coding data (Section 2.6):
- Coding transformation: $y = rac{x - a}{b}$
- Mean of coded data: $\bar{y} = rac{\bar{x} - a}{b}$
- Standard deviation of coded data: $sy = rac{sx}{|b|}$
- Uncoding: $x = by + a$
Class boundaries and interpolation with grouped data:
- When class intervals are contiguous (no gaps): class boundaries are the endpoints.
- When gaps exist: use halfway points as class boundaries (e.g., 55-65 becomes 54.5 to 65.5).
Box plots and stem-and-leaf diagrams (preview for Chapter 3):
- Box plots summarize quartiles, min/max, and possible outliers; stem-and-leaf shows distribution shape and quartiles.
Practical exercises and problem-solving cues:
- Exercise questions progress in difficulty and include “interpolation” and “interquartile” concepts.

Chapter 2 — Key Formulas and Concepts (LaTeX versions)

Mean (ungrouped):
$\bar{x} = rac{ extstyle p extstyle }{n}$
Mean (grouped):
$\bar{x} = rac{ extstyle extstyle extstyle \sum f x}{ extstyle extstyle \\sum f}$
Range, IQR, Interpercentile range:
- Range: $R = x{ ext{max}} - x{ ext{min}}$
- IQR: $ext{IQR} = Q3 - Q1$
- Interpercentile range: $ext{IPR}{p,q} = Qq - Q_p$
Variance and standard deviation (population):
- $ext{Var}(X) = E(X^2) - [E(X)]^2 = rac{ ext{Σ}x^2}{n} - \bar{X}^2$
- $ext{SD}(X) = ms ext{Var}(X)$
Z-score (coding for standard normal):
- Standardization: $Z = rac{X - \bar{x}}{ ext{SD}}$
- For coding with a transformation, the effects follow the same linear rules as above.
Class boundaries and midpoints (grouped data):
- Class midpoint: $mi = rac{Li + U_i}{2}$
- Class width: $wi = Ui - L_i$
- If class boundaries are not integers, use correct boundary values (e.g., 54.5, 65.5).

Chapter 3 — Representations of Data

Histograms (grouped continuous data):
- Area of a bar is proportional to frequency; height (frequency density) is defined by
  $ext{Frequency density} = rac{ ext{frequency}}{ ext{class width}}$
- When width varies, use area to compare frequencies.
- A frequency polygon connects midpoints of histogram bars.
Box plots and outliers (definition):
- An outlier is often identified as values outside $Q1 - 1.5 imes ext{IQR}$ or $Q3 + 1.5 imes ext{IQR}$ (one common rule; the exam may specify a different k).
- Box plot shows min, Q1, median, Q3, max, and optional outliers.
Skewness indicators:
- Box plot shape and the relative order of quartiles (
  Q1, Q2, Q3) reveal skewness.
- A simple numeric skew index can be formed from quartiles (examples in exercises).
Stem-and-leaf diagrams:
- A graphical method that preserves raw data and shows distribution shape; used to read quartiles directly.
- Back-to-back stem-and-leaf diagrams compare two data sets.
Outlier handling and data cleaning:
- Distinguish between genuine extreme values and errors/anomalies; justify removal with context.

Chapter 3 — Chapter Summary (Key Rules)

Mode, median, mean for data types:
- Mode for categorical data or bimodal data; median for quantitative data; mean for quantitative data when no extreme values distort it.
For discrete data in a frequency table:
- Mean: $\bar{x} = rac{ extstyle ext{Σ} f x}{ ext{Σ} f}$
- Median and quartiles can be found from the cumulative frequencies.
Interpolation for grouped data median/quartiles:
- With grouped data, interpolate within the class containing the desired percentile.
Skewness and interpretation:
- Skewness can be described qualitatively via box plots and quartile relationships.

Chapter 4 — Probability

Core vocabulary:
- Experiment, event, sample space, probability, independent events, mutually exclusive events, conditional probability.
Venn diagrams and set notation:
- Intersection: $A ext{ and } B = A \cap B$
- Union: $A ext{ or } B = A \cup B$
- Complement: $A' = ext{not } A$
- n(A), n(B) denote counts in sets A and B.
Basic probability rules:
- For mutually exclusive events: $P(A B) = P(A) + P(B)$
- For any events: $P(A B) = P(A) + P(B) - P(A B)$
- Conditional probability: $P(B|A) = rac{P(A B)}{P(A)}$ ; independent if $P(B|A) = P(B)$ .
Tree diagrams:
- Useful for sequential events; probabilities multiply along branches.
Two-way tables (contingency tables):
- Organized counts to compute marginals, conditionals, and independence.
Common problems include: calculating P(A), P(B), P(A|B), P(B|A'), etc.

Chapter 4 — Chapter Summary (Key Formulas)

Union/Intersection: $P(A B) = P(A) + P(B) - P(A B)$
Conditional probability: $P(B|A) = rac{P(A B)}{P(A)}$
Independence test through multiplication: if independent, $P(A B) = P(A)P(B)$ and equivalently, $P(B|A) = P(B)$ .
Tree diagrams and Bayes-style reasoning are used to compute conditional probabilities.

Chapter 5 — Correlation and Regression

Bivariate data and scatter diagrams:
- Use to determine if a linear relationship exists between two variables (explanatory x and response y).
- Positive vs negative vs no correlation; linearity is assumed for regression.
Product moment correlation coefficient (PMCC):
- Define summary statistics: $S_{xx} = \,\sum (x - \bar{x})^2 = \sum x^2 - (\sum x)^2/n$
- $S_{yy} = \sum (y - \bar{y})^2 = \sum y^2 - (\sum y)^2/n$
- $S_{xy} = \sum (x - \bar{x})(y - \bar{y}) = \sum xy - (\sum x)(\sum y)/n$
- PMCC: $r = \frac{S{xy}}{\sqrt{S{xx} S_{yy}}}$
- Interpretation: r from -1 to 1; closer to ±1 means stronger linear relationship.
Linear regression line (least squares):
- Regression line of y on x: $y = a + b x$
- Gradient (slope): $b = \frac{S{xy}}{S{xx}}$
- Intercept: $a = \bar{y} - b \bar{x}$
Use of coding to simplify regression: linear coding y = (x - a)/b; r is invariant to linear coding.
Interpolation vs extrapolation:
- Interpolation within the data range is more reliable; extrapolation beyond the data range is less reliable.
Residuals and model checking:
- Residual for a data point is the difference between observed and predicted value. A good model has residuals randomly scattered around 0.
Summary idea from the PMCC and regression:
- A strong positive PMCC suggests a positive linear relationship and a reliable regression line (subject to data being appropriate for a linear model).

Chapter 5 — Chapter Summary (Key Formulas)

Regression line: $y = a + b x, ext{ where } b = \frac{S{xy}}{S{xx}},\ a = \bar{y} - b \bar{x}$
PMCC: $r = \frac{S{xy}}{\sqrt{S{xx} S_{yy}}}$
Sums in compact form (useful with raw or summarized data):
- $S_{xx} = \sum x^2 - \frac{(\sum x)^2}{n}$
- $S_{xy} = \sum xy - \frac{(\sum x)(\sum y)}{n}$
- $S_{yy} = \sum y^2 - \frac{(\sum y)^2}{n}$
Expectations and residuals: explain connection between observed and predicted values; residuals should be randomly distributed about 0 for a good linear model.

Chapter 6 — Discrete Random Variables

Random variable basics:
- Random variable X maps outcomes to numerical values; discrete means X takes a countable set of values.
- Probability distribution of X: $P(X=x)$ for each possible x; total probability sums to 1.
Expected value and variance for discrete X:
- Expected value (mean): $E(X) = \sum x P(X=x)$
- For a simple table: if values are xi with probabilities pi, then $E(X) = \sumi xi p_i$
- Variance: $\operatorname{Var}(X) = E(X^2) - [E(X)]^2,\qquad E(X^2) = \sum xi^2 P(X=xi)$
Cumulative distribution function (CDF): $F(x) = P(X \le x)$ ; obtained by summing probabilities up to x.
Discrete uniform distribution:
- Values 1,2,…,n equally likely; $E(X) = \frac{n+1}{2},\ \operatorname{Var}(X) = \frac{(n+1)(n-1)}{12}$
Transformations (linear coding) and their effect on PMCC and means:
- If you code X with Y = aX + b, then
- $E(Y) = aE(X) + b,$
- $\operatorname{Var}(Y) = a^2 \operatorname{Var}(X)$
Examples and techniques appear throughout exercises (e.g., using summary data with Sxx, Sxy, etc.).

Chapter 7 — The Normal Distribution

Normal distribution basics:
- Continuous distribution with bell-shaped, symmetric curve; mean μ and variance σ^2.
- Probability statements are about ranges, not exact values: e.g., P(a < X < b).
- Key property: 68% within μ ± σ; 95% within μ ± 2σ; 99.7% within μ ± 3σ.
Standard normal distribution:
- Z ~ N(0,1) with z-score: $Z = \dfrac{X - \mu}{\sigma}$
- If X ~ N(μ, σ^2), then X = μ + σ Z.
Probability tables and z-values:
- Use the standard normal table (Φ) to find P(Z < z) for given z, or to find z for a given probability.
- Inverse normal: find z such that P(Z < z) = p.
Examples of standardization and probability calculation:
- If X ~ N(μ, σ^2) and z = (x - μ)/σ, then P(X < x) = P(Z < z).
- Use the table to determine probabilities or z-values; for probabilities outside the table, use 1 − Φ(z) and symmetry.
Solving mean/variance from percentiles:
- Given percentile information, solve for μ and σ by using z-values from the standard normal table and equations like
  $\frac{x - μ}{σ} = z_p$
Applications and problem-solving examples across the chapter illustrate:
- Finding P(X > a), P(X < a), P(X ∈ [a,b]), and corresponding z-values.
- Using inverse normal to determine μ and σ from percentile data.

Chapter 7 — Chapter Summary (Key Formulas)

Standardization: $Z = \frac{X - μ}{σ},\, Z \sim N(0,1)$
Percentile and z-value relationships via Φ(z): $P(Zz) use 1 - Φ(z).</li><li>Within:$ P(μ-σ \le X \le μ+σ) ≈ 0.68;\ P(μ-2σ \le X \le μ+2σ) ≈ 0.95;\ P(μ-3σ \le X \le μ+3σ) ≈ 0.997.
Inverse normal calculations:
- If P(Z < z) = p, find z from the standard normal table or inverse function.
Practical tasks include estimating μ and σ from percentile data, and using z-tables to evaluate probabilities for X ~ N(μ, σ^2).

General Notes: Formulae Quick Reference (LaTeX)

Mean (ungrouped): ar{x} = \frac{\sum x}{n} $</li><li>Mean (grouped):$ ar{x} = \frac{\sum f x}{\sum f} $</li><li>Class width and midpoints (grouped data):<ul><li>$ w = U - L $, midpoint$ m = \frac{L+U}{2} $</li></ul></li><li>Variance (population):$ \operatorname{Var}(X) = E(X^2) - [E(X)]^2 = \frac{\sum x^2}{n} - \bar{x}^2 $</li><li>Standard deviation:$ \operatorname{SD}(X) = \sqrt{\operatorname{Var}(X)} $</li><li>For grouped data (variance):$ \operatorname{Var}(X) = \frac{\sum f (x-\bar{x})^2}{\sum f} \quad\text{(approx.)} $</li><li>Coding (linear): if$ y = a x + b $, then $ E(Y) = a E(X) + b,\quad \operatorname{Var}(Y) = a^2 \operatorname{Var}(X) $</li><li>Regression line (y on x):$ y = a + b x,\quad b = \frac{S{xy}}{S{xx}},\ a = \bar{y} - b \bar{x} $</li><li>PMCC:$ r = \frac{S{xy}}{\sqrt{S{xx} S_{yy}}} $</li><li>Discrete expected value and variance: $ E(X) = \sum x P(X=x),\quad \operatorname{Var}(X) = E(X^2) - [E(X)]^2 $</li><li>Normal standardization:$ Z = \frac{X - μ}{σ},\ Z \sim N(0,1) $</li><li>For a vertical bar of a histogram: area = frequency; height = density; density = freq / width.</li><li>Outliers (common rule):$ Q1 - 1.5 \cdot \, \text{IQR},\; Q3 + 1.5 \cdot \, \text{IQR}$$

Quick Visual Reference (What to Study for Exams)

Modelling: 7-stage cycle, advantages/disadvantages, and refinement.
Data types: qualitative vs quantitative; discrete vs continuous; when to use median vs mean; interpreting IQR and percentiles.
Graphical summaries: histograms, box plots, stem-and-leaf; how to read class boundaries and interpolate within classes.
Probability toolkit: Venn diagrams; set notation; independence; mutual exclusivity; conditional probability; tree diagrams.
Regression focus: how to compute regression line and r from summarized data; interpreting slope and intercept; checking residuals.
Normal distribution mastery: standardization, Φ-table usage, z-values, and real-data reasoning about when to apply Normal models.

If you'd like, I can tailor these notes to a specific chapter or particular subset of topics (e.g., just the Probability or just the Normal Distribution) and extend any formula derivations or worked examples from the transcript.