GCSE Statistics vocabulary
GCSE Statistics Vocabulary
Data and Sampling
Hypothesis
A statement which may or may not be true.
A statistical investigation is conducted to verify its validity.
Population
All items/people under investigation (e.g., all students in Year 10).
Sample Frame
A comprehensive list of all members of the population (could also be a database or register).
Random Sample
Every item in the population has an equal chance of being included in the sample.
Stratified (Random) Sampling
Population divided into strata (e.g., gender, school year).
Proportions in the sample match those in the population.
Members in each stratum chosen randomly.
Judgement Sampling
Non-random sampling based on specific criteria (e.g., selecting the first 20 items/people).
Cluster Sampling
Non-random sampling using all members from randomly selected clusters (e.g., all pupils in 3 random tutor groups).
Quota Sampling
Non-random sampling where an interviewer selects a predetermined number of people across different categories (age, gender).
Systematic Sampling
Non-random sampling starting from a random point and selecting at fixed intervals.
Cleaning Data
Cleaning may be required to enhance reliability and usability.
Tasks may include:
Addressing outliers or missing data.
Standardizing formats/units.
Removing unnecessary symbols.
Anomaly
A value that does not fit with the rest of the data (e.g., far from line of best fit).
Outlier
A value that is suspiciously high or low; boundaries identified using
Mean ± 3 × s.d.
1.5 × IQR above the upper quartile or below the lower quartile.
Variables
Variables
Values being investigated, which can differ across the population.
Types include: discrete, continuous, qualitative.
Multivariate Problems
Issues where more than one linked variable is analyzed (e.g., driving test performance by gender and time).
Categorical Data
Data that fits into accessible categories (e.g., gender).
Ordinal Data
Data reflecting a rank order (e.g., race positions).
Distribution
A set of values of a variable alongside their frequencies or probabilities.
Extraneous Variables
Variables not under investigation that may influence results.
Efforts are made to limit their effect (e.g. time of day while comparing reaction times).
Control Groups & Matched Pairs
Control Group: used alongside a test group for comparison.
Matched Pairs: ensure similarity between two groups to reduce effects of extraneous variables.
Example: test group receives a new drug, control group receives a placebo.
Question Types
Closed Questions
Require a choice from stated answers, facilitating easier analysis.
Open Questions
No restrictions on answers; harder to analyze (generally best to avoid).
Pilot Survey / Pre-test
Trial questionnaire on a small scale to assess:
Clarity of questions.
Sufficiency of data collected.
Response rates.
Coverage of response options.
Random Response
Method used to estimate answers to sensitive questions.
Involves an element of randomness to elicit more reliable responses (e.g., using dice to decide if a person responds).
Reliability & Validity
Reliability:
Consistency of results upon repeat testing (e.g., small sample may yield unreliable results).
Validity:
The degree to which a process measures what it intends to measure (e.g., surveying Year 7 about Year 10's food opinions could have poor validity).
Displaying & Comparing Data
Choropleth Map:
Uses shading with darker areas indicating higher values.
Frequency Density:
Represents bar heights on a histogram; area of bar = frequency.
Central Tendency
Refers to averages (mean, median, mode).
Dispersion
Indicates spread of data (range, IQR, standard deviation, variance).
Variance: square of standard deviation.
Interpercentile Range (IPR) & Interdecile Range
Range of central distribution parts (e.g., middle 80% or middle 60%).
Standardised Score
Measures how much a value deviates from the mean in standard deviations; used to compare across distributions.
Scatter Diagrams
Bivariate Data: paired data (e.g., scores in Math and English).
Association
Relationship between two variables.
Correlation
Relationship between two numerical variables.
Explanatory & Response Variables
Explanatory (Independent) goes on the x-axis.
Response (Dependent) goes on the y-axis.
Regression Line / Regression Equation
Best fit line calculated using statistical software; gradient indicates rate of change.
Causation
Indicates that one variable change causes the change in another.
Spurious Correlation
Indicates correlation without causation (e.g., both may increase due to a third factor).
Spearman’s Rank Correlation Coefficient (SRCC)
Calculation indicates relationship likelihood; scale from -1 (perfect negative) to +1 (perfect positive).
Pearson’s Product Moment Correlation Coefficient (PMCC)
Calculated relationship indicator, not required to calculate during exam.
Time Series & Changes Over Time
Trend
Long-term changes described as rising, falling, or level; not just fluctuations.
Seasonal Variation
Patterns that repeat regularly (e.g., sales peaks).
Mean Seasonal Effect: Average of differences from trend line for specific times.
Index Number & Base Year
Comparisons as percentages to a base year.
Base year index defined as 100.
Weighted Index
Averages index numbers of different items; reflects economic impact more accurately.
RPI & CPI
RPI: Weighted index for common living costs, measure of inflation.
CPI: Similar inflation measure excluding mortgage payments.
GDP
Total goods/services produced in a year, indicating economic growth or recession.
Chain Base Index Number
Compares using the previous year as base; geometric mean applied for annual percentage change.
Distribution of Sample Means
Different samples yield varying estimates for mean; means are less spread than original values.
Quality Assurance & Control Charts
Ensures production quality monitoring; included action and warning lines for discrepancies.
Crude & Standardised Rates
Crude Rate: Number per thousand for births/deaths/unemployment.
Standardised Rate: Adjusts figures for population age distribution differences.
Probability
Sample Space Diagram: Represents all possible outcomes typically in a table format.
Mutually Exclusive Events
Events that cannot occur simultaneously; P(A) + P(B) = P(A or B).
Exhaustive Outcomes
All possible outcomes are included, summing probabilities to 1.
Relative Frequency & Conditional Probability
Relative frequency to provide experimental probability; conditional for probabilities given prior outcomes.
Independence
If events are independent, occurrence of one does not influence the occurrence of the other (P(A) × P(B) = P(A and B)).
Absolute vs. Relative Risk
Absolute Risk: Likelihood of an event happening independently (e.g., being late for work).
Relative Risk: Compares event likelihood as a proportion of another event.
Binomial Distribution B(n, p)
Binomial distribution properties include:
Two possible outcomes (success/failure).
Fixed trials and independent trials.
Normal Distribution N(μ, σ²)
Typically represented with mean μ and standard deviation σ; follows a bell-curve shape.