GCSE Statistics vocabulary
GCSE Statistics – Vocabulary You Must Know
Data and Sampling
Hypothesis
A statement which may or may not be true.
A statistical investigation is used to see if there is evidence to support the hypothesis.
Population
All items/people being investigated (e.g. all students in Year 10, all fireworks made by a factory).
Sample Frame
A list of all members of the population (could be a register or database).
Sampling Methods
Random Sample
All items in the population have an equal chance of being selected for the sample.
Stratified (Random) Sampling
Population divided into strata (e.g. gender/school year), matching sample proportions to the population.
Members are chosen randomly from each strata.
Judgment Sampling
Non-random sampling, selecting using specific criteria (e.g. first 20 items/people).
Cluster Sampling
Non-random sampling using all members from randomly chosen clusters (e.g. all pupils in 3 randomly chosen tutor groups).
Quota Sampling
Non-random sampling where an interviewer selects a pre-determined number of people from different age-groups/genders.
Systematic Sampling
Non-random sampling from a random start point at fixed intervals.
Data Cleaning
Cleaning Data
Improving reliability and usability of data for statistical software.
May involve dealing with outliers, missing data, and standardizing formats/units.
Anomaly
A value that does not fit the rest of the data (e.g. far from the line of best fit).
Outlier
A suspiciously high or low value.
Variables
Variables
Values being investigated that vary among members of the population; can be discrete, continuous, qualitative, etc.
Multivariate Problems
Investigating more than one linked variable (e.g. driving test performance by gender and time of day).
Categorical Data
Data that fits into clearly defined categories (e.g. gender, voting intention).
Ordinal Data
Data indicating a rank order (e.g. positions in a race).
Distribution
Distribution
The set of values and their frequencies or probabilities.
Extraneous Variables
Variables not being investigated that affect outcome (e.g. time of day when comparing reaction times).
Control Groups & Matched Pairs
Control Group
Used alongside a test group for comparison.
Matched Pairs
Used to make two groups as similar as possible, reducing extraneous variable effects.
Types of Questions
Closed Questions
Require a choice from stated answers (easy to analyze and graph).
Open Questions
Have no restrictions on answers (harder to analyze, often best avoided).
Pilot Survey / Pre-Test
Testing a questionnaire on a small scale to check for necessary changes.
Additional Concepts
Random Response Technique
Used to estimate responses to sensitive questions for reliability by adding an element of chance (e.g. using dice).
Reliability & Validity
Reliability
The extent to which repeating a process leads to similar results.
Validity
The extent to which a process measures what it intends to.
Displaying and Comparing Data
Choropleth Map
Uses shading; darker represents higher numbers.
Central Tendency
Means average (e.g. mean, median, mode).
Dispersion
Measures spread (e.g. range, IQR, standard deviation, variance).
Variance
The square of the standard deviation.
Seasonal and Time Series Analysis
Trend
Long-term changes over time; described as rising, falling, or level.
Seasonal Variation
Patterns that repeat at regular intervals.
Mean Seasonal Effect
The average of numerical differences from the trend line for a certain time.
Index Numbers & Economic Measures
Index Number
New value as a percentage of the value in a base year; stated without “%.”
RPI / CPI
Retail Price Index / Consumer Price Index, measures of inflation.
GDP
Gross Domestic Product indicating total value of goods and services produced by a country.
Standardised Rates
Adjusts rates for age distribution comparisons.
Probability Concepts
Sample Space & Events
Sample Space Diagram
Represents all possible outcomes (e.g. table for dice rolls).
Mutually Exclusive Events
Events that cannot happen together (P(A) + P(B) = P(A or B)).
Exhaustive Events
All possible outcomes included; probabilities sum to 1.
Frequency and Conditional Probability
Relative Frequency
Experimental probability derived from outcomes.
Absolute Risk
Probability of an event occurring.
Relative Risk
Chance of an event relative to another.
Distributions
B(n, p)
Binomial distribution with n outcomes and probability p.
N(μ, σ²)
Normal distribution with mean μ and standard deviation σ.
Key properties include symmetry and peak at the mean.