Notes on Psychology Statistics (Descriptive and Inferential)

Descriptive Statistics

  • Purpose: describe, organize, and summarize data
  • Frequency distribution
    • Definition: how often each data point appears
    • Visuals: frequency (y-axis) vs data points (x-axis)
    • Example shown: x-axis could be number of pets someone has

Measures of Central Tendency

  • Definitions
    • Mean (average): \bar{x} = \frac{1}{n} \sum{i=1}^n xi
    • Median (midpoint of an ordered set)
    • Mode (most common value)
  • Decision rules
    • Choice depends on data distribution: normal/symmetric (bell-shaped) vs skewed
    • Skewness: named for its tail direction
  • Skew and its implications
    • If the tail extends to the left (low values), the distribution is negatively skewed
    • If the tail extends to the right (high values), the distribution is positively skewed
    • When data are skewed, the mean is pulled toward the tail
    • In skewed distributions, the mode or median can be better descriptors of the typical value than the mean

Measures of Variability

  • Purpose: describe how far scores are spread from the mean
  • Standard deviation (SD)
    • Definition: a numeric indicator of spread around the mean
    • Population vs sample nuance: SD is higher when scores are far from the mean; lower when scores cluster near the mean
    • Interpretation:
    • Far spread → high SD
    • Close spread → low SD
  • Relationship to central tendency
    • Measures of variability accompany measures of central tendency
    • Two samples can have the same mean but different SDs, indicating different data shapes or consistency
  • Example excerpt (illustrative): a set like 1, 2, 4, 5, 3, 4, 2 has mean \bar{x}=3, but SD would reveal spread around that mean

Example: Describing a Sample’s Happiness

  • Question: On average, how happy is the sample?
  • Reported values: mean = 2.47 (sample is moderately happy)
  • Interpretation of spread: SD is low, indicating scores cluster closely around the mean

Inferential Statistics

  • Goal: can we infer that results from the sample apply to the population?
  • Key concepts
    • Sample vs Population: subset of individuals vs all individuals
    • Correlation: the relationship between two variables
    • Correlation coefficient: r, the numeric indicator of the strength and direction of the relationship
    • Example context: in a psychology course, sample of DS 101 students vs population of all DS 101 students (worldwide)

Direction of Correlation

  • Positive correlation
    • Definition: as one variable increases, the other also increases (or both decrease together)
    • Real-world examples:
    • Height and weight
    • GRE scores and graduate school success
    • High school GPA and university GPA

Negative (Inverse) Correlation

  • Negative correlation
    • Definition: as one variable increases, the other decreases
    • Real-world examples:
    • Smoking and health
    • Flossing and tooth decay
    • Absences and exam scores

Strength of Correlation

  • Strength measure: absolute value of the correlation coefficient
    • The closer |r| is to 1, the stronger the correlation
    • The closer |r| is to 0, the weaker the correlation
  • Interpretation guidance: stronger correlations imply more consistent directional association, but not causation

p-value and Statistical Significance

  • p-value: the probability that the observed result (or something more extreme) would occur if the null hypothesis were true
  • Common threshold: p < 0.05 indicates statistical significance
    • Interpretation: less than 5% chance the result is due to random variation under the null hypothesis
  • Non-significance: p \ge 0.05 indicates results could plausibly be due to chance

Why Correlation is Not Causation

  • Key reasons
    • No random assignment across conditions in most observational studies
    • Extraneous variables (third variables) may drive the observed relationship
    • Causation requires ruling out alternative explanations and establishing temporal precedence (often via experimental design)

Important Formulas and Concepts (Summary)

  • Mean: \bar{x} = \frac{1}{n} \sum{i=1}^n xi
  • Standard deviation (sample): s = \sqrt{\frac{1}{n-1} \sum{i=1}^n (xi - \bar{x})^2}
  • Correlation coefficient: r = \frac{\sum{i=1}^n (xi - \bar{x})(yi - \bar{y})}{\sqrt{\sum{i=1}^n (xi - \bar{x})^2} \; \sqrt{\sum{i=1}^n (y_i - \bar{y})^2}}
  • p-value concept: p = P( \text{extreme data} \mid H_0 ) (probability of observing data as extreme as, or more extreme than, what was observed under the null hypothesis)

Connections to Broader Principles

  • Central tendency and variability together give a full picture of a dataset
  • Understanding distribution shape (normal vs skewed) guides choice of descriptive statistics (mean vs median vs mode)
  • Inferential statistics connect sample observations to population inferences, tempered by concerns about causation vs correlation and potential confounds