GCSE Statistics vocabulary

GCSE Statistics Vocabulary

Data and Sampling

  • Hypothesis

    • A statement which may or may not be true.

    • A statistical investigation is conducted to verify its validity.

  • Population

    • All items/people under investigation (e.g., all students in Year 10).

  • Sample Frame

    • A comprehensive list of all members of the population (could also be a database or register).

  • Random Sample

    • Every item in the population has an equal chance of being included in the sample.

  • Stratified (Random) Sampling

    • Population divided into strata (e.g., gender, school year).

    • Proportions in the sample match those in the population.

    • Members in each stratum chosen randomly.

  • Judgement Sampling

    • Non-random sampling based on specific criteria (e.g., selecting the first 20 items/people).

  • Cluster Sampling

    • Non-random sampling using all members from randomly selected clusters (e.g., all pupils in 3 random tutor groups).

  • Quota Sampling

    • Non-random sampling where an interviewer selects a predetermined number of people across different categories (age, gender).

  • Systematic Sampling

    • Non-random sampling starting from a random point and selecting at fixed intervals.

Cleaning Data

  • Cleaning may be required to enhance reliability and usability.

  • Tasks may include:

    • Addressing outliers or missing data.

    • Standardizing formats/units.

    • Removing unnecessary symbols.

  • Anomaly

    • A value that does not fit with the rest of the data (e.g., far from line of best fit).

  • Outlier

    • A value that is suspiciously high or low; boundaries identified using

      • Mean ± 3 × s.d.

      • 1.5 × IQR above the upper quartile or below the lower quartile.

Variables

  • Variables

    • Values being investigated, which can differ across the population.

    • Types include: discrete, continuous, qualitative.

  • Multivariate Problems

    • Issues where more than one linked variable is analyzed (e.g., driving test performance by gender and time).

  • Categorical Data

    • Data that fits into accessible categories (e.g., gender).

  • Ordinal Data

    • Data reflecting a rank order (e.g., race positions).

  • Distribution

    • A set of values of a variable alongside their frequencies or probabilities.

Extraneous Variables

  • Variables not under investigation that may influence results.

  • Efforts are made to limit their effect (e.g. time of day while comparing reaction times).

Control Groups & Matched Pairs

  • Control Group: used alongside a test group for comparison.

  • Matched Pairs: ensure similarity between two groups to reduce effects of extraneous variables.

    • Example: test group receives a new drug, control group receives a placebo.

Question Types

  • Closed Questions

    • Require a choice from stated answers, facilitating easier analysis.

  • Open Questions

    • No restrictions on answers; harder to analyze (generally best to avoid).

Pilot Survey / Pre-test

  • Trial questionnaire on a small scale to assess:

    • Clarity of questions.

    • Sufficiency of data collected.

    • Response rates.

    • Coverage of response options.

Random Response

  • Method used to estimate answers to sensitive questions.

  • Involves an element of randomness to elicit more reliable responses (e.g., using dice to decide if a person responds).

Reliability & Validity

  • Reliability:

    • Consistency of results upon repeat testing (e.g., small sample may yield unreliable results).

  • Validity:

    • The degree to which a process measures what it intends to measure (e.g., surveying Year 7 about Year 10's food opinions could have poor validity).

Displaying & Comparing Data

  • Choropleth Map:

    • Uses shading with darker areas indicating higher values.

  • Frequency Density:

    • Represents bar heights on a histogram; area of bar = frequency.

Central Tendency

  • Refers to averages (mean, median, mode).

Dispersion

  • Indicates spread of data (range, IQR, standard deviation, variance).

    • Variance: square of standard deviation.

Interpercentile Range (IPR) & Interdecile Range

  • Range of central distribution parts (e.g., middle 80% or middle 60%).

Standardised Score

  • Measures how much a value deviates from the mean in standard deviations; used to compare across distributions.

Scatter Diagrams

  • Bivariate Data: paired data (e.g., scores in Math and English).

Association

  • Relationship between two variables.

Correlation

  • Relationship between two numerical variables.

Explanatory & Response Variables

  • Explanatory (Independent) goes on the x-axis.

  • Response (Dependent) goes on the y-axis.

Regression Line / Regression Equation

  • Best fit line calculated using statistical software; gradient indicates rate of change.

Causation

  • Indicates that one variable change causes the change in another.

Spurious Correlation

  • Indicates correlation without causation (e.g., both may increase due to a third factor).

Spearman’s Rank Correlation Coefficient (SRCC)

  • Calculation indicates relationship likelihood; scale from -1 (perfect negative) to +1 (perfect positive).

Pearson’s Product Moment Correlation Coefficient (PMCC)

  • Calculated relationship indicator, not required to calculate during exam.

Time Series & Changes Over Time

  • Trend

    • Long-term changes described as rising, falling, or level; not just fluctuations.

Seasonal Variation

  • Patterns that repeat regularly (e.g., sales peaks).

  • Mean Seasonal Effect: Average of differences from trend line for specific times.

Index Number & Base Year

  • Comparisons as percentages to a base year.

  • Base year index defined as 100.

Weighted Index

  • Averages index numbers of different items; reflects economic impact more accurately.

RPI & CPI

  • RPI: Weighted index for common living costs, measure of inflation.

  • CPI: Similar inflation measure excluding mortgage payments.

GDP

  • Total goods/services produced in a year, indicating economic growth or recession.

Chain Base Index Number

  • Compares using the previous year as base; geometric mean applied for annual percentage change.

Distribution of Sample Means

  • Different samples yield varying estimates for mean; means are less spread than original values.

Quality Assurance & Control Charts

  • Ensures production quality monitoring; included action and warning lines for discrepancies.

Crude & Standardised Rates

  • Crude Rate: Number per thousand for births/deaths/unemployment.

  • Standardised Rate: Adjusts figures for population age distribution differences.

Probability

  • Sample Space Diagram: Represents all possible outcomes typically in a table format.

Mutually Exclusive Events

  • Events that cannot occur simultaneously; P(A) + P(B) = P(A or B).

Exhaustive Outcomes

  • All possible outcomes are included, summing probabilities to 1.

Relative Frequency & Conditional Probability

  • Relative frequency to provide experimental probability; conditional for probabilities given prior outcomes.

Independence

  • If events are independent, occurrence of one does not influence the occurrence of the other (P(A) × P(B) = P(A and B)).

Absolute vs. Relative Risk

  • Absolute Risk: Likelihood of an event happening independently (e.g., being late for work).

  • Relative Risk: Compares event likelihood as a proportion of another event.

Binomial Distribution B(n, p)

  • Binomial distribution properties include:

    • Two possible outcomes (success/failure).

    • Fixed trials and independent trials.

Normal Distribution N(μ, σ²)

  • Typically represented with mean μ and standard deviation σ; follows a bell-curve shape.