GCSE_Statistics_vocabulary
GCSE Statistics Vocabulary
Data and Sampling
Hypothesis: A statement that may or may not be true, used in statistical investigations to test for evidence.
Population: The total group being studied (e.g., all students in Year 10, all fireworks produced by a factory).
Sample Frame: A comprehensive list of all members of the population (e.g., register, database).
Random Sample: Every member of the population has an equal chance of being selected.
Stratified Random Sampling: Population divided into strata (e.g., gender, school year); sample proportions match population proportions, with members chosen randomly from each strata.
Judgement Sampling: Non-random sampling based on specific selection criteria (e.g., selecting the first 20 individuals).
Cluster Sampling: All members from randomly selected clusters are included (e.g., all students from 3 randomly chosen tutor groups).
Quota Sampling: Non-random sampling where a set number of people from different groups are selected based on predetermined criteria (e.g., age, gender).
Systematic Sampling: Non-random sampling starting from a random point and selecting at fixed intervals.
Cleaning Data
Cleaning may be necessary for enhanced reliability and usability by statistical software, including:
Dealing with outliers or missing data
Standardizing data format and units
Removing unnecessary symbols
Anomaly: Data point that significantly deviates from others (e.g., out of line on a scatter diagram).
Outlier: Abnormal high or low value, boundaries identified by:
Mean ± 3 × standard deviation or
1.5 × interquartile range (IQR) above the upper quartile or below the lower quartile.
Variables and Data Types
Variables: Values being measured that can vary among members of a population, categorized into:
Discrete
Continuous
Qualitative
Multivariate: Problems involving more than one linked variable (e.g., analyzing how driving test performance is affected by gender and time of day).
Categorical Data: Fits into defined categories (e.g., gender, voting preference, car model).
Ordinal Data: Indicates rank order (e.g., race finishing positions).
Distribution: Set of variable values along with frequencies or probabilities.
Extraneous Variables: Unrelated variables that may influence the outcome of an investigation (e.g., time of day affecting reaction times).
Control Groups and Pairs
Control Group: Used alongside a test group to facilitate comparisons.
Matched Pairs: Two comparable individuals (one from each group); used to enhance group similarity.
Example: Test group receiving a new drug vs control group receiving a placebo.
Question Types
Closed Questions: Require choosing from fixed answers (e.g., tick boxes); simple to analyze.
Open Questions: No pre-set answers; more complex to analyze – best avoided in favor of closed questions.
Pilot Survey: Conduct a small-scale questionnaire beforehand to test for question clarity and response adequacy.
Displaying and Comparing Data
Data Representation
Choropleth Map: A map using shading to indicate higher or lower values.
Frequency Density: Used for histogram bar heights, where bar area equals frequency.
Central Tendency: Measures of average (mean, median, mode).
Dispersion: Measures of spread (range, IQR, standard deviation, variance).
Variance and Standard Deviation
Variance: Square of the standard deviation.
Standard Deviation: Key measure of dispersion in data.
Interpercentile and Interdecile Ranges: Spread of data between certain percentiles or deciles, discarding extreme values from either end.
Standardized Scores
Indicate how many standard deviations a value is from the mean.
Positive scores indicate above mean, negative scores indicate below mean. Helps to compare values across different distributions.
Scatter Diagrams
Bivariate Data: Paired data (e.g., Maths vs. English scores).
Association: Relationship between two variables (e.g., height and gender).
Correlation: Relationship between two numerical variables (e.g., height and hand-span).
Explanatory & Response Variables: Explanatory goes on the x-axis and appears to affect the response variable on the y-axis.
Regression Line: Best-fit line calculated statistically, used for interpretation of rates of change.
Causation and Correlation
Causation: When a change in one variable causes a change in another.
Spurious Correlation: Observed correlation with no causal link, often due to a third variable.
Spearman's Rank Correlation Coefficient (SRCC): Indicates if a relationship is likely, with a scale from -1 (perfect negative) to +1 (perfect positive).
Pearson’s Product Moment Correlation Coefficient (PMCC): Indicates likelihood of a linear relationship.
Time Series and Changes Over Time
Trend Analysis
Trend: Long-term changes; described as rising, falling, or stable.
Seasonal Variation: Regularly repeating patterns over time (e.g., quarterly sales fluctuations).
Mean Seasonal Effect: Average numerical difference from the trend line during specific periods.
Seasonal Effect: Calculated as observed value minus trend line value.
Index Numbers
Index Number: New value as a percentage of the base year's value.
Base Year: Year used for comparison, typically assigned an index of 100.
Weighted Index: Average of index numbers weighted by significance or quantity.
RPI: Retail Price Index; a measure of inflation reflecting average changes in household goods and services.
CPI: Consumer Price Index; similar to RPI but excludes mortgage costs.
GDP: Gross Domestic Product; measures a country's economic performance.
Control Charts and Quality Assurance
Control Chart: Visual tool to monitor process consistency over time.
Action Line: Action required if a value exceeds certain standard deviations from the mean.
Warning Line: Additional checks needed if reached; served as a preemptive measure.
Probability
Basic Probability Concepts
Sample Space Diagram: Represents all possible outcomes (e.g., table for two dice).
Mutually Exclusive Events: Cannot occur simultaneously; P(A) + P(B) = P(A or B).
Exhaustive Events: All possible outcomes sum to 1.
Relative Frequency: Measures experimental probability.
Conditional Probability: Probability of event A given that B has occurred.
Independence: Event A's probability is unaffected by event B.
Risk and Distribution
Absolute Risk: Probability of an event occurring.
Relative Risk: Risk in comparison to another event.
Binomial Distribution (B(n, p)): Combines outcomes, defined by number of trials and success probability.
Normal Distribution (N(μ, σ²)): Describes continuous variables with a bell-shaped curve; characterizes many naturally occurring variables.
Key Properties: Symmetrical about the mean, most values (99.8%) within 3 SDs of mean.
This information serves as critical vocabulary and concepts for understanding GCSE Statistics.