Notes on Expectations, Standard Deviation, and Standard Error of the Mean
Notes on Expectations, Standard Deviation, and Standard Error of the Mean
Scope of video: covers three core ideas (expectations/mean, standard deviation, standard error of the mean), how to graph them, and how to construct data tables for the class.
Expectations
Definition: The expectation (mean) is a measure of central tendency and represents the long-run average if the experiment were repeated many times.
Population vs. sample:
Population mean: \mu = E[X] = \sum{i} xi \;pi for discrete variables, or \mu = E[X] = \int{-\infty}^{\infty} x \; f(x) \, dx for continuous variables.
Sample mean (estimate of the population mean): \bar{x} = \frac{1}{n} \sum{i=1}^n xi
Properties (conceptual): mean is the point around which observations are centered; sums of deviations from the mean balance to zero (sum of (x_i - \bar{x}) = 0).
Role in data tables/graphs: the mean is often plotted or reported as the central value for a group or condition; used to compare groups when combined with an index of variability.
Relationship to the law of large numbers (intuition): as sample size grows, the sample mean converges to the population mean.
Graphing Expectations (Means)
Common visual options:
Bar graphs or column charts with the mean as the central value for each group.
Line graphs showing mean across time or ordered categories.
Overlay of individual data points (dot plots) to show dispersion around the mean.
Error bars on means: indicate uncertainty or variability around the mean (often tied to standard error of the mean or standard deviation).
Considerations:
Choose metric (mean) appropriate to the scale of measurement and distribution shape.
Ensure clear labeling, units, and legend if multiple groups are shown.
Standard Deviation
Purpose: measures how dispersed the data are around the mean; a summary of variability.
Population standard deviation: \sigma = \sqrt{\frac{1}{N} \sum{i=1}^N (xi - \mu)^2}
Sample standard deviation (unbiased estimator of the population SD): s = \sqrt{\frac{1}{n-1} \sum{i=1}^n (xi - \bar{x})^2}
Interpretation:
Smaller values: observations cluster near the mean.
Larger values: greater spread of data.
Units: same as the data (since you take a square root of squared deviations).
Relationship to the variance: \mathrm{Var}(X) = \sigma^2 (population) or s^2 = \frac{1}{n-1} \sum (x_i - \bar{x})^2 (sample), with "unbiased" connotation for the sample SD when estimating the population SD.
Standard Error of the Mean (SEM)
Purpose: measures the precision with which the sample mean estimates the population mean; reflects sampling variability of the mean, not the spread of individual observations.
Formula (population): SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}}
Estimator when population SD unknown: replace (\sigma) with the sample SD (s): SE_{\bar{X}} \approx \frac{s}{\sqrt{n}}
Interpretation:
Smaller SEM implies more precise estimate of the population mean from the sample.
SEM decreases as sample size n increases, with the rate proportional to (1/\sqrt{n}).
Relationship to confidence intervals:
Approximate 95% CI for the mean (assuming normality or large n): \bar{x} \pm z{0.975} \cdot SE{\bar{X}} where a z-value is used for large samples.
If population variance unknown and sample size is small: use the t-distribution: \bar{x} \pm t{(n-1, 0.975)} \cdot SE{\bar{X}}
Graphical use: error bars on mean plots commonly show ±1 SEM or ±1.96 SE for approximate 95% CI, but notation must be clear to avoid misinterpretation.
Data Tables: Constructing and Reporting
Key columns to include when presenting data:
Group/Condition identifier
n (sample size per group)
Mean (\bar{x})
Standard deviation (s) or (\sigma) (population, if known)
Standard error of the mean (SE_{\bar{X}})
Formatting guidelines:
Clearly label units and scales.
Report numbers to an appropriate level of precision (e.g., two decimals for means, two decimals for SE/SD).
Include a note on whether SD is population or sample; specify if SEM is based on sample SD.
Example table structure:
Group | n | (\bar{x}) | s | SE(\bar{X}) | [optional] 95% CI
Example data snippet (illustrative):
Group A: n = 5, data = {2, 4, 5, 7, 9}
Mean: \bar{x} = \frac{1}{5} (2+4+5+7+9) = 5.4
Differences: (-3.4, -1.4, -0.4, 1.6, 3.6)
Squared deviations: (11.56, 1.96, 0.16, 2.56, 12.96)
Sum = 29.2
Sample variance: s^2 = \frac{29.2}{n-1} = \frac{29.2}{4} = 7.3
Sample SD: s = \sqrt{7.3} \approx 2.702
SEM: SE_{\bar{X}} = \frac{s}{\sqrt{n}} = \frac{2.702}{\sqrt{5}} \approx 1.208
Approx 95% CI (t with df = 4, ~2.776): 5.4 \pm 2.776 \cdot 1.208 \approx (2.05, 8.75)
Connections to Foundations and Real-World Relevance
Relationship to sampling theory: SEM and SD reflect different aspects of variability—data spread vs. precision of the mean estimate.
Practical implications:
Misinterpreting SEM as data spread can mislead readers about variability within the data.
Reporting both mean and SD helps convey central tendency and dispersion; SEM helps understand the reliability of the mean estimate across samples.
Ethical/statistical diligence:
Use appropriate error measures for the context (e.g., SD for variability, SEM for precision of the mean).
When presenting, be transparent about which measure is used and the sample size, so conclusions are not overstated.
Quick Formulas to Memorize
Population mean: \mu = E[X] = \sum{i} xi pi (discrete) or \mu = E[X] = \int{-\infty}^{\infty} x f(x) \, dx (continuous)
Sample mean: \bar{x} = \frac{1}{n} \sum{i=1}^n xi
Population SD: \sigma = \sqrt{\frac{1}{N} \sum{i=1}^N (xi - \mu)^2}
Sample SD: s = \sqrt{\frac{1}{n-1} \sum{i=1}^n (xi - \bar{x})^2}
Standard error of the mean: SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}} \approx \frac{s}{\sqrt{n}}
Confidence interval for the mean (large n): \bar{x} \pm z{0.975} \cdot SE{\bar{X}}
Confidence interval for the mean (small n, t): \bar{x} \pm t{(n-1, 0.975)} \cdot SE{\bar{X}}
Summary and Key Takeaways
Mean (expectation) summarizes the central tendency of a distribution; it is a population parameter (μ) or a sample statistic (x̄).
Standard deviation (SD) describes how spread out individual observations are around the mean; it is σ for population and s for sample.
Standard error of the mean (SEM) describes the precision of the sample mean as an estimate of the population mean; SEM decreases with larger n and is computed as σ/√n (or s/√n as an estimator).
Graphs of means often use error bars to convey variability or precision; choose SD vs SEM carefully and state what the error bars represent.
Data tables should report n, mean, SD (or σ), and SEM with clear labels, units, and rounding rules; provide a transparent example to illustrate calculation steps.
Always connect these concepts to their practical interpretation and be mindful of how conclusions may be affected by sample size and chosen error measures.