Not used
III.A. Terms and Definitions
- Overview: Statistics as a body of knowledge to evaluate variability in all systems; variability exists in measurements, signatures, snowflake patterns, etc.
- Core topics from the Body of Knowledge overview:
- Terms and Definitions
- Data Types and Collection Methods
- Sampling
- Measurement Systems Analysis
- Statistical Process Control (SPC)
- Advanced Statistical Analysis
- This section introduces the foundational concepts used throughout Data Analysis for CQPA preparation.
III.A.1. Basic Statistics
Descriptive statistics explain characteristics of a sample or population, including:
- Measures of center: mean, median, mode
- Measures of variability: range, standard deviation, variance
- Location and frequency, and cumulative distributions
Central tendency (location of data):
- Mean (x¯): arithmetic average; symbolized as x̄; used with many distributions, especially normal data
- Median: middle value that splits data into two equal halves; useful with skewed data
- Mode: most frequent value; can be multiple values (multimodal)
Mean example: data set 1, 1, 2, 3, 4, 5, 5 → mean = ar{x} = rac{1+1+2+3+4+5+5}{7} = 3.
Median example: data sets illustrated in transcript show medians such as 3, 5, 5.5 depending on data arrangement.
Mode example: data sets where the mode is 1 and 5; bimodal distributions discussed.
Variability and dispersion concepts:
- Variability (dispersion) describes how data spread around the center
- Range: difference between max and min; simple dispersion measure; not always informative about spread in the data
- Standard deviation (s or σ): average distance of data points from the mean; most commonly used for dispersion; linked to normal distributions
- Variance: square of the standard deviation; for a population, \sigma^2 = rac{1}{N}\sum{i=1}^{N}(xi-ar{x})^2; for a sample, s^2 = rac{1}{n-1}\sum{i=1}^{n}(xi-ar{x})^2.
Example: for data set 1, 2, 3, calculate mean = 2; then variance and standard deviation via the standard deviation formula; demonstration in transcript shows a simple calculation leading to a variance of 1 and standard deviation of 1 for the small example.
Summary points:
- Descriptive statistics describe center and spread
- Central tendency describes location; dispersion describes spread
- The sample mean is an unbiased estimator for the population mean (via Central Limit Theorem considerations)
Quick formulas (summary):
- Mean (sample): ar{x} = rac{1}{n}
extstyle\sum{i=1}^{n} xi - Population mean: $$ar{\
- Mean (sample): ar{x} = rac{1}{n}