GCSE Statistics vocabulary

GCSE Statistics – Key Vocabulary and Concepts

Hypothesis:
- A statement that can be either true or false.
- It requires statistical investigation to find supporting evidence.
Population:
- The complete set of items or people under investigation.
- Examples: All students in Year 10; all fireworks made by a factory.
Sample Frame:
- A comprehensive list of all members of the population (e.g., register or database).

Random Sample:
- Each item in the population has an equal chance of being selected.
Stratified (Random) Sampling:
- Population is divided into strata (e.g., gender), and samples are drawn to match the population proportions. Members in each stratum are chosen at random.
Judgement Sampling:
- Non-random sampling based on criteria (e.g., first 20 people).
Cluster Sampling:
- Non-random sampling using all members from randomly chosen clusters (e.g., all pupils in randomly selected tutor groups).
Quota Sampling:
- Non-random sampling where pre-determined characteristics (age, gender) are selected.
Systematic Sampling:
- Non-random sampling method that uses fixed intervals from a random starting point.

Cleaning Data:
- Involves improving data reliability and usability by addressing outliers, missing data, and standardizing formats.
Anomaly:
- A value that does not fit well with the rest of the data (e.g., far from the trend shown in scatter diagrams).
Outlier:
- An extremely high or low value. Identified using: mean ± 3 × s.d. or 1.5 × IQR.

Variables:
- Values being investigated that vary among different members (can be discrete, continuous, or qualitative).
Multivariate Problems:
- Involve more than one linked variable (e.g., driving test performance by gender and time of day).
Categorical Data:
- Fits into clear categories (e.g., gender, car make).
Ordinal Data:
- Indicates rank order (e.g., positions in a race).

Distribution:
- The values of a variable along with their frequencies or probabilities.
Extraneous Variables:
- Variables not under investigation that might affect outcomes; should be controlled (e.g., time of day).
Control Groups & Matched Pairs:
- Control groups allow comparisons with test groups; matched pairs aim to make groups similar to minimize extraneous effects.
Closed/Open Questions:
- Closed questions have specific answer options; easier to analyze. Open questions have unrestricted answers; harder to analyze and often avoided.
Pilot Survey/Pre-test:
- A small-scale test of a questionnaire to identify necessary changes before larger-scale use.

Random Response:
- Technique used to estimate sensitive question responses, allowing for more reliable data collection through chance.
Reliability:
- Consistency of results upon repeated trials (e.g., large samples yield more reliable outcomes).
Validity:
- Measures whether a process accurately assesses what it intends to measure (e.g., collecting opinions from the correct demographic).

Choropleth Map:
- A map showing data using shading (darker means higher values).
Central Tendency:
- Measures of average: mean, median, mode.
Dispersion:
- Measures of spread: range, IQR, standard deviation.
Standard Deviation:
- A measure of variability in a distribution.
Interpercentile/Interdecile Ranges:
- Measures spread focused on selected percentiles (e.g., middle 80% of data).

Bivariate Data:
- Paired data (e.g., related scores).
Association and Correlation:
- Association refers to the relationship between two variables; correlation specifically involves numerical data.
Explanatory and Response Variables:
- Explanatory variable (independent) is on the x-axis; response variable (dependent) is on the y-axis.

Regression Line:
- The line representing the relationship between variables; calculated, not drawn by eye.
Causation:
- The change in one variable causes a change in another (e.g., variable x affecting variable y).
Spurious Correlation:
- Apparent correlation without a causal connection.
Spearman’s Rank Correlation Coefficient:
- Indicates the strength of a relationship, calculated based on rankings.
Pearson’s Product Moment Correlation Coefficient:
- Measures linear relationships without requiring explicit calculations.

Trend:
- Long-term changes over time (not just short-term fluctuations).
Seasonal Variation:
- Regular patterns repeating over specific intervals.
Index Numbers:
- Represent values relative to a base year's values (base index = 100).
Weighted Index:
- Averages index numbers after considering varying weights (e.g., different materials).

Quality Assurance:
- Processes to ensure production consistency.
Control Charts:
- Plot mean or range results over time; includes action and warning lines for monitoring quality.
Crude Rates and Standardized Rates:
- Raw rates per population; standardized rates adjust to reflect the age distribution.

Sample Space Diagram:
- Visual representation of all possible outcomes (e.g., a probability table for dice rolls).
Mutually Exclusive Events:
- Events that cannot occur together.
Exhaustive Events:
- Complete set of outcomes where the total probability equals 1.
Relative Frequency:
- Based on experimental data.
Absolute vs. Relative Risk:
- Absolute risk quantifies the likelihood of an event; relative risk compares the likelihood of different events.
Binomial Distribution:
- Two possible outcomes per trial (success/failure); applies to fixed trials with consistent probabilities.
Normal Distribution:
- Continuous and symmetrical distribution commonly found in natural data, characterized by mean and standard deviation.