GCSE Statistics vocabulary

GCSE Statistics – Key Vocabulary and Concepts

1. Data and Sampling

  • Hypothesis:

    • A statement that can be either true or false.

    • It requires statistical investigation to find supporting evidence.

  • Population:

    • The complete set of items or people under investigation.

    • Examples: All students in Year 10; all fireworks made by a factory.

  • Sample Frame:

    • A comprehensive list of all members of the population (e.g., register or database).

2. Sampling Techniques

  • Random Sample:

    • Each item in the population has an equal chance of being selected.

  • Stratified (Random) Sampling:

    • Population is divided into strata (e.g., gender), and samples are drawn to match the population proportions. Members in each stratum are chosen at random.

  • Judgement Sampling:

    • Non-random sampling based on criteria (e.g., first 20 people).

  • Cluster Sampling:

    • Non-random sampling using all members from randomly chosen clusters (e.g., all pupils in randomly selected tutor groups).

  • Quota Sampling:

    • Non-random sampling where pre-determined characteristics (age, gender) are selected.

  • Systematic Sampling:

    • Non-random sampling method that uses fixed intervals from a random starting point.

3. Data Preparation

  • Cleaning Data:

    • Involves improving data reliability and usability by addressing outliers, missing data, and standardizing formats.

  • Anomaly:

    • A value that does not fit well with the rest of the data (e.g., far from the trend shown in scatter diagrams).

  • Outlier:

    • An extremely high or low value. Identified using: mean ± 3 × s.d. or 1.5 × IQR.

4. Types of Variables

  • Variables:

    • Values being investigated that vary among different members (can be discrete, continuous, or qualitative).

  • Multivariate Problems:

    • Involve more than one linked variable (e.g., driving test performance by gender and time of day).

  • Categorical Data:

    • Fits into clear categories (e.g., gender, car make).

  • Ordinal Data:

    • Indicates rank order (e.g., positions in a race).

5. Additional Concepts

  • Distribution:

    • The values of a variable along with their frequencies or probabilities.

  • Extraneous Variables:

    • Variables not under investigation that might affect outcomes; should be controlled (e.g., time of day).

  • Control Groups & Matched Pairs:

    • Control groups allow comparisons with test groups; matched pairs aim to make groups similar to minimize extraneous effects.

  • Closed/Open Questions:

    • Closed questions have specific answer options; easier to analyze. Open questions have unrestricted answers; harder to analyze and often avoided.

  • Pilot Survey/Pre-test:

    • A small-scale test of a questionnaire to identify necessary changes before larger-scale use.

6. Random Response Techniques and Data Validity

  • Random Response:

    • Technique used to estimate sensitive question responses, allowing for more reliable data collection through chance.

  • Reliability:

    • Consistency of results upon repeated trials (e.g., large samples yield more reliable outcomes).

  • Validity:

    • Measures whether a process accurately assesses what it intends to measure (e.g., collecting opinions from the correct demographic).

7. Data Presentation and Analysis

  • Choropleth Map:

    • A map showing data using shading (darker means higher values).

  • Central Tendency:

    • Measures of average: mean, median, mode.

  • Dispersion:

    • Measures of spread: range, IQR, standard deviation.

  • Standard Deviation:

    • A measure of variability in a distribution.

  • Interpercentile/Interdecile Ranges:

    • Measures spread focused on selected percentiles (e.g., middle 80% of data).

8. Scatter Diagrams and Correlation

  • Bivariate Data:

    • Paired data (e.g., related scores).

  • Association and Correlation:

    • Association refers to the relationship between two variables; correlation specifically involves numerical data.

  • Explanatory and Response Variables:

    • Explanatory variable (independent) is on the x-axis; response variable (dependent) is on the y-axis.

9. Regression and Causation

  • Regression Line:

    • The line representing the relationship between variables; calculated, not drawn by eye.

  • Causation:

    • The change in one variable causes a change in another (e.g., variable x affecting variable y).

  • Spurious Correlation:

    • Apparent correlation without a causal connection.

  • Spearman’s Rank Correlation Coefficient:

    • Indicates the strength of a relationship, calculated based on rankings.

  • Pearson’s Product Moment Correlation Coefficient:

    • Measures linear relationships without requiring explicit calculations.

10. Time Series Data

  • Trend:

    • Long-term changes over time (not just short-term fluctuations).

  • Seasonal Variation:

    • Regular patterns repeating over specific intervals.

  • Index Numbers:

    • Represent values relative to a base year's values (base index = 100).

  • Weighted Index:

    • Averages index numbers after considering varying weights (e.g., different materials).

11. Quality Assurance and Probabilities

  • Quality Assurance:

    • Processes to ensure production consistency.

  • Control Charts:

    • Plot mean or range results over time; includes action and warning lines for monitoring quality.

  • Crude Rates and Standardized Rates:

    • Raw rates per population; standardized rates adjust to reflect the age distribution.

12. Basic Probability Concepts

  • Sample Space Diagram:

    • Visual representation of all possible outcomes (e.g., a probability table for dice rolls).

  • Mutually Exclusive Events:

    • Events that cannot occur together.

  • Exhaustive Events:

    • Complete set of outcomes where the total probability equals 1.

  • Relative Frequency:

    • Based on experimental data.

  • Absolute vs. Relative Risk:

    • Absolute risk quantifies the likelihood of an event; relative risk compares the likelihood of different events.

  • Binomial Distribution:

    • Two possible outcomes per trial (success/failure); applies to fixed trials with consistent probabilities.

  • Normal Distribution:

    • Continuous and symmetrical distribution commonly found in natural data, characterized by mean and standard deviation.