GCSE_Statistics_vocabulary

GCSE Statistics Vocabulary

Data and Sampling

  • Hypothesis: A statement that may or may not be true, used in statistical investigations to test for evidence.

  • Population: The total group being studied (e.g., all students in Year 10, all fireworks produced by a factory).

  • Sample Frame: A comprehensive list of all members of the population (e.g., register, database).

  • Random Sample: Every member of the population has an equal chance of being selected.

  • Stratified Random Sampling: Population divided into strata (e.g., gender, school year); sample proportions match population proportions, with members chosen randomly from each strata.

  • Judgement Sampling: Non-random sampling based on specific selection criteria (e.g., selecting the first 20 individuals).

  • Cluster Sampling: All members from randomly selected clusters are included (e.g., all students from 3 randomly chosen tutor groups).

  • Quota Sampling: Non-random sampling where a set number of people from different groups are selected based on predetermined criteria (e.g., age, gender).

  • Systematic Sampling: Non-random sampling starting from a random point and selecting at fixed intervals.

Cleaning Data

  • Cleaning may be necessary for enhanced reliability and usability by statistical software, including:

    • Dealing with outliers or missing data

    • Standardizing data format and units

    • Removing unnecessary symbols

  • Anomaly: Data point that significantly deviates from others (e.g., out of line on a scatter diagram).

  • Outlier: Abnormal high or low value, boundaries identified by:

    • Mean ± 3 × standard deviation or

    • 1.5 × interquartile range (IQR) above the upper quartile or below the lower quartile.

Variables and Data Types

  • Variables: Values being measured that can vary among members of a population, categorized into:

    • Discrete

    • Continuous

    • Qualitative

  • Multivariate: Problems involving more than one linked variable (e.g., analyzing how driving test performance is affected by gender and time of day).

  • Categorical Data: Fits into defined categories (e.g., gender, voting preference, car model).

  • Ordinal Data: Indicates rank order (e.g., race finishing positions).

  • Distribution: Set of variable values along with frequencies or probabilities.

  • Extraneous Variables: Unrelated variables that may influence the outcome of an investigation (e.g., time of day affecting reaction times).

Control Groups and Pairs

  • Control Group: Used alongside a test group to facilitate comparisons.

  • Matched Pairs: Two comparable individuals (one from each group); used to enhance group similarity.

  • Example: Test group receiving a new drug vs control group receiving a placebo.

Question Types

  • Closed Questions: Require choosing from fixed answers (e.g., tick boxes); simple to analyze.

  • Open Questions: No pre-set answers; more complex to analyze – best avoided in favor of closed questions.

  • Pilot Survey: Conduct a small-scale questionnaire beforehand to test for question clarity and response adequacy.


Displaying and Comparing Data

Data Representation

  • Choropleth Map: A map using shading to indicate higher or lower values.

  • Frequency Density: Used for histogram bar heights, where bar area equals frequency.

  • Central Tendency: Measures of average (mean, median, mode).

  • Dispersion: Measures of spread (range, IQR, standard deviation, variance).

Variance and Standard Deviation

  • Variance: Square of the standard deviation.

  • Standard Deviation: Key measure of dispersion in data.

  • Interpercentile and Interdecile Ranges: Spread of data between certain percentiles or deciles, discarding extreme values from either end.

Standardized Scores

  • Indicate how many standard deviations a value is from the mean.

  • Positive scores indicate above mean, negative scores indicate below mean. Helps to compare values across different distributions.

Scatter Diagrams

  • Bivariate Data: Paired data (e.g., Maths vs. English scores).

  • Association: Relationship between two variables (e.g., height and gender).

  • Correlation: Relationship between two numerical variables (e.g., height and hand-span).

  • Explanatory & Response Variables: Explanatory goes on the x-axis and appears to affect the response variable on the y-axis.

  • Regression Line: Best-fit line calculated statistically, used for interpretation of rates of change.

Causation and Correlation

  • Causation: When a change in one variable causes a change in another.

  • Spurious Correlation: Observed correlation with no causal link, often due to a third variable.

  • Spearman's Rank Correlation Coefficient (SRCC): Indicates if a relationship is likely, with a scale from -1 (perfect negative) to +1 (perfect positive).

  • Pearson’s Product Moment Correlation Coefficient (PMCC): Indicates likelihood of a linear relationship.


Time Series and Changes Over Time

Trend Analysis

  • Trend: Long-term changes; described as rising, falling, or stable.

  • Seasonal Variation: Regularly repeating patterns over time (e.g., quarterly sales fluctuations).

  • Mean Seasonal Effect: Average numerical difference from the trend line during specific periods.

  • Seasonal Effect: Calculated as observed value minus trend line value.

Index Numbers

  • Index Number: New value as a percentage of the base year's value.

  • Base Year: Year used for comparison, typically assigned an index of 100.

  • Weighted Index: Average of index numbers weighted by significance or quantity.

  • RPI: Retail Price Index; a measure of inflation reflecting average changes in household goods and services.

  • CPI: Consumer Price Index; similar to RPI but excludes mortgage costs.

  • GDP: Gross Domestic Product; measures a country's economic performance.

Control Charts and Quality Assurance

  • Control Chart: Visual tool to monitor process consistency over time.

  • Action Line: Action required if a value exceeds certain standard deviations from the mean.

  • Warning Line: Additional checks needed if reached; served as a preemptive measure.


Probability

Basic Probability Concepts

  • Sample Space Diagram: Represents all possible outcomes (e.g., table for two dice).

  • Mutually Exclusive Events: Cannot occur simultaneously; P(A) + P(B) = P(A or B).

  • Exhaustive Events: All possible outcomes sum to 1.

  • Relative Frequency: Measures experimental probability.

  • Conditional Probability: Probability of event A given that B has occurred.

  • Independence: Event A's probability is unaffected by event B.

Risk and Distribution

  • Absolute Risk: Probability of an event occurring.

  • Relative Risk: Risk in comparison to another event.

  • Binomial Distribution (B(n, p)): Combines outcomes, defined by number of trials and success probability.

  • Normal Distribution (N(μ, σ²)): Describes continuous variables with a bell-shaped curve; characterizes many naturally occurring variables.

  • Key Properties: Symmetrical about the mean, most values (99.8%) within 3 SDs of mean.

This information serves as critical vocabulary and concepts for understanding GCSE Statistics.