GCSE Statistics vocabulary
GCSE Statistics – Key Vocabulary and Concepts
1. Data and Sampling
Hypothesis:
A statement that can be either true or false.
It requires statistical investigation to find supporting evidence.
Population:
The complete set of items or people under investigation.
Examples: All students in Year 10; all fireworks made by a factory.
Sample Frame:
A comprehensive list of all members of the population (e.g., register or database).
2. Sampling Techniques
Random Sample:
Each item in the population has an equal chance of being selected.
Stratified (Random) Sampling:
Population is divided into strata (e.g., gender), and samples are drawn to match the population proportions. Members in each stratum are chosen at random.
Judgement Sampling:
Non-random sampling based on criteria (e.g., first 20 people).
Cluster Sampling:
Non-random sampling using all members from randomly chosen clusters (e.g., all pupils in randomly selected tutor groups).
Quota Sampling:
Non-random sampling where pre-determined characteristics (age, gender) are selected.
Systematic Sampling:
Non-random sampling method that uses fixed intervals from a random starting point.
3. Data Preparation
Cleaning Data:
Involves improving data reliability and usability by addressing outliers, missing data, and standardizing formats.
Anomaly:
A value that does not fit well with the rest of the data (e.g., far from the trend shown in scatter diagrams).
Outlier:
An extremely high or low value. Identified using: mean ± 3 × s.d. or 1.5 × IQR.
4. Types of Variables
Variables:
Values being investigated that vary among different members (can be discrete, continuous, or qualitative).
Multivariate Problems:
Involve more than one linked variable (e.g., driving test performance by gender and time of day).
Categorical Data:
Fits into clear categories (e.g., gender, car make).
Ordinal Data:
Indicates rank order (e.g., positions in a race).
5. Additional Concepts
Distribution:
The values of a variable along with their frequencies or probabilities.
Extraneous Variables:
Variables not under investigation that might affect outcomes; should be controlled (e.g., time of day).
Control Groups & Matched Pairs:
Control groups allow comparisons with test groups; matched pairs aim to make groups similar to minimize extraneous effects.
Closed/Open Questions:
Closed questions have specific answer options; easier to analyze. Open questions have unrestricted answers; harder to analyze and often avoided.
Pilot Survey/Pre-test:
A small-scale test of a questionnaire to identify necessary changes before larger-scale use.
6. Random Response Techniques and Data Validity
Random Response:
Technique used to estimate sensitive question responses, allowing for more reliable data collection through chance.
Reliability:
Consistency of results upon repeated trials (e.g., large samples yield more reliable outcomes).
Validity:
Measures whether a process accurately assesses what it intends to measure (e.g., collecting opinions from the correct demographic).
7. Data Presentation and Analysis
Choropleth Map:
A map showing data using shading (darker means higher values).
Central Tendency:
Measures of average: mean, median, mode.
Dispersion:
Measures of spread: range, IQR, standard deviation.
Standard Deviation:
A measure of variability in a distribution.
Interpercentile/Interdecile Ranges:
Measures spread focused on selected percentiles (e.g., middle 80% of data).
8. Scatter Diagrams and Correlation
Bivariate Data:
Paired data (e.g., related scores).
Association and Correlation:
Association refers to the relationship between two variables; correlation specifically involves numerical data.
Explanatory and Response Variables:
Explanatory variable (independent) is on the x-axis; response variable (dependent) is on the y-axis.
9. Regression and Causation
Regression Line:
The line representing the relationship between variables; calculated, not drawn by eye.
Causation:
The change in one variable causes a change in another (e.g., variable x affecting variable y).
Spurious Correlation:
Apparent correlation without a causal connection.
Spearman’s Rank Correlation Coefficient:
Indicates the strength of a relationship, calculated based on rankings.
Pearson’s Product Moment Correlation Coefficient:
Measures linear relationships without requiring explicit calculations.
10. Time Series Data
Trend:
Long-term changes over time (not just short-term fluctuations).
Seasonal Variation:
Regular patterns repeating over specific intervals.
Index Numbers:
Represent values relative to a base year's values (base index = 100).
Weighted Index:
Averages index numbers after considering varying weights (e.g., different materials).
11. Quality Assurance and Probabilities
Quality Assurance:
Processes to ensure production consistency.
Control Charts:
Plot mean or range results over time; includes action and warning lines for monitoring quality.
Crude Rates and Standardized Rates:
Raw rates per population; standardized rates adjust to reflect the age distribution.
12. Basic Probability Concepts
Sample Space Diagram:
Visual representation of all possible outcomes (e.g., a probability table for dice rolls).
Mutually Exclusive Events:
Events that cannot occur together.
Exhaustive Events:
Complete set of outcomes where the total probability equals 1.
Relative Frequency:
Based on experimental data.
Absolute vs. Relative Risk:
Absolute risk quantifies the likelihood of an event; relative risk compares the likelihood of different events.
Binomial Distribution:
Two possible outcomes per trial (success/failure); applies to fixed trials with consistent probabilities.
Normal Distribution:
Continuous and symmetrical distribution commonly found in natural data, characterized by mean and standard deviation.