Central Tendency and Variability - PSYC 2021A Lecture Notes

Central Tendency

  • Descriptive statistic that best represents the center of a data set; the value around which data seem to gather.

  • Importance: Provides a compact summary of the data’s central location, around which other data cluster.

  • Major measures:

    • Mean: Arithmetic average; sensitive to outliers and skew.
    • Median: Middle score after ordering; robust to outliers.
    • Mode: Most frequent score; especially useful for nominal data; also informative for unimodal/bimodal/multimodal distributions.
  • Notation and terminology:

    • Statistic (sample-based): usually denoted by Latin letters. Mean is often written as \bar{X} or M.
    • Parameter (population-based): usually denoted by Greek letters. Mean is \mu\, (pronounced “mu”).
    • Distinction reminder: statistics describe samples; parameters describe populations.
  • Quick take on when to use which measure (based on distribution and data type):

    • Symmetric/sale data with no outliers: use the mean.
    • Skewed distributions or data with outliers: use the median.
    • Nominal data: use the mode (since means/medians are not meaningful for nominal scales).
  • Worked example (from slide): If you have the values {100, 40, 40, 10, 8} (N = 5):

    • Sorted: 8, 10, 40, 40, 100
    • Mean: \bar{X} = \frac{100 + 40 + 40 + 10 + 8}{5} = \frac{198}{5} = 39.6
    • Median: 40 (the middle value)
    • Mode: 40 (appears twice)
    • Interpretation: The data are somewhat skewed by the 100; mean is pulled upward, while the median is 40.

Visuals and distribution shapes

  • Unimodal distribution: one clear peak (one mode).

  • Bimodal distribution: two distinct peaks (two modes).

  • Multimodal distribution: more than two modes.

  • Central tendency measures align with distribution shape:

    • Peak location often near the mean in symmetric distributions.
    • In skewed distributions, median better represents the center; mean can be distorted by outliers.
  • Outliers:

    • Definition: extreme scores far from the rest of the data.
    • Consequence: can heavily influence the mean; the median is more robust to outliers.
  • Skewness and outliers influence choice of summary statistic and interpretation of the data.

Measures of Central Tendency (expanded)

  • A statistic is a number based on a sample; parameters are based on the whole population.

  • Stat vs parameter notation (recap):

    • Mean (sample): \bar{X} or M
    • Mean (population): \mu
  • Summary of the three measures:

    • Mean: Best for symmetric distributions; can be distorted by outliers.
    • Median: Best for skewed distributions or when outliers are present.
    • Mode: Best for nominal data; useful for highly discrete or dominated values; can be used for unimodal, bimodal, or multimodal data.
  • Median as a percentile:

    • The median is the 50th percentile of the data.

Calculating the Range and Interquartile Range

  • Range: difference between the largest and smallest values.
    • Formula: range = X{\text{highest}} - X{\text{lowest}}
  • Interquartile Range (IQR): spread of the middle 50% of the data.
    • Process: Find Q1 (median of lower half) and Q3 (median of upper half); IQR = Q3 - Q1.
    • Note: The median is the 50th percentile; IQR focuses on the central portion.

Measures of Variability

  • Variability describes how spread out the data are.

  • Key measures:

    • Range
    • Interquartile Range (IQR)
    • Variance
    • Standard Deviation
  • Variance:

    • Population variance: \sigma^2 = \frac{1}{N} \sum{i=1}^{N} (Xi - \mu)^2
    • Sample variance: s^2 = \frac{1}{n-1} \sum{i=1}^{n} (Xi - \bar{X})^2
    • Note: To estimate the population variance from a sample, we divide by (n-1).
  • Standard Deviation:

    • Population standard deviation: \sigma = \sqrt{\sigma^2}
    • Sample standard deviation: s = \sqrt{s^2}
    • Interpretation: Average deviation from the mean (in the same units as the data).

Choosing Appropriate Descriptive Statistics

  • For continuous data:
    • If the distribution is not skewed: use Mean with Standard Deviation.
    • If the distribution is skewed: use Median with Range or IQR.
  • For ordinal data: use Median (and possibly IQR).
  • For nominal data: use Mode.
  • Normality checks: Shapiro-Wilk test (W statistic and p-value) to assess if data are approximately normally distributed.
    • Example outputs frequently include W and p; small p-values suggest non-normality.

Descriptive Statistics in Practice (Jamovi)

  • Jamovi provides a Descriptive Statistics module.

  • Typical outputs include:

    • N (sample size)
    • Mean
    • Standard error of the mean (SEM)
    • Median
    • Standard deviation (SD)
    • Range
    • Skewness and Kurtosis (with standard errors)
    • Shapiro-Wilk W and p-value for normality
  • Sample interface features:

    • Data view, Variables setup, Descriptives analysis panel, and results view.
    • Allows switching between descriptive summaries and visualizations.
  • From raw data to descriptive stats (process):

    • Look through raw data to ensure values look reasonable and consistent.
    • Use descriptive statistics to summarize the data compactly.
    • Check for data entry errors or anomalies before analysis.

Example: Describing Enjoyment of Statistics vs Grades (APA-style write-up)

  • Study question: Do students who enjoy statistics differ in PSYC 2021 grades compared to those who do not enjoy statistics or are undecided?

  • Reported descriptive results (example):

    • Yes group: mean m = 81.9%, N = 37
    • No group: mean m = 74.2%, N = 37
    • Undecided group: mean m = 75.9%, N = 55
    • Additional statistics typically reported: Median, SD, and SEM as in Table 1 and Figure 1 (illustrating means across groups).
  • Example APA style sentence:

    • Descriptive statistics indicated that students who enjoyed statistics had a higher mean grade (m = 81.9%) than those who did not enjoy statistics (m = 74.2%) or were undecided (m = 75.9%). See Table 1 for complete descriptive statistics (N, Mean, Median, SD) by group.
  • Table 1 (descriptive statistics) highlights:

    • Groups: Yes, No, Undecided
    • Sample sizes (N)
    • Means, Medians, Standard Deviations (SD)
    • Other statistics as provided (e.g., standard error of the mean, sometimes reported as SEM)
  • Figure 1 (visual): shows average grades by group (Yes/No/Undecided).

Practical and ethical implications

  • Misleading statistics and charts: a chart can visually mislead if axes aren’t scaled consistently or if data are truncated; critical evaluation of charts is essential.
  • Source examples: discussions of misleading charts (e.g., policy-related critiques) illustrate why context, data quality, and proper labeling matter for accurate interpretation.

Appendix: Key vocabulary and formulas

  • Statistics vs parameters

    • Statistic: sample-based value (e.g., \bar{X}, sample mean; s for SD)
    • Parameter: population-based value (e.g., \mu, population mean; \sigma for population SD)
  • Common symbols:

    • Mean: \bar{X} or M; population mean: \mu
    • Variance: s^2 (sample), \sigma^2 (population)
    • Standard deviation: s (sample), \sigma (population)
  • Quick reference formulas:

    • Range: range = X{\text{max}} - X{\text{min}}
    • IQR: IQR = Q3 - Q1
    • Population variance: \sigma^2 = \frac{1}{N} \sum{i=1}^{N} (Xi - \mu)^2
    • Sample variance: s^2 = \frac{1}{n-1} \sum{i=1}^{n} (Xi - \bar{X})^2
    • Population SD: \sigma = \sqrt{\sigma^2}
    • Sample SD: s = \sqrt{s^2}
    • SEM: \text{SEM} = \frac{s}{\sqrt{N}}
  • Notes on interpretation:

    • The mean is informative for symmetric distributions but can be misleading for skewed data with outliers.
    • The median provides a robust central tendency for skewed data.
    • The mode offers the most frequent value and is essential for nominal data.
  • Chapter and course logistics

    • Chapter 4: Learning Curve (Due Today)
    • Chapter 5: Learning Curve reminder (Due Sept 25) and mini assignment details provided in class materials.