Notes on Psychology Statistics (Descriptive and Inferential)

Descriptive Statistics

Purpose: describe, organize, and summarize data
Frequency distribution
- Definition: how often each data point appears
- Visuals: frequency (y-axis) vs data points (x-axis)
- Example shown: x-axis could be number of pets someone has

Definitions
- Mean (average): \bar{x} = \frac{1}{n} \sum{i=1}^n xi
- Median (midpoint of an ordered set)
- Mode (most common value)
Decision rules
- Choice depends on data distribution: normal/symmetric (bell-shaped) vs skewed
- Skewness: named for its tail direction
Skew and its implications
- If the tail extends to the left (low values), the distribution is negatively skewed
- If the tail extends to the right (high values), the distribution is positively skewed
- When data are skewed, the mean is pulled toward the tail
- In skewed distributions, the mode or median can be better descriptors of the typical value than the mean

Purpose: describe how far scores are spread from the mean
Standard deviation (SD)
- Definition: a numeric indicator of spread around the mean
- Population vs sample nuance: SD is higher when scores are far from the mean; lower when scores cluster near the mean
- Interpretation:
- Far spread → high SD
- Close spread → low SD
Relationship to central tendency
- Measures of variability accompany measures of central tendency
- Two samples can have the same mean but different SDs, indicating different data shapes or consistency
Example excerpt (illustrative): a set like 1, 2, 4, 5, 3, 4, 2 has mean \bar{x}=3, but SD would reveal spread around that mean

Question: On average, how happy is the sample?
Reported values: mean = 2.47 (sample is moderately happy)
Interpretation of spread: SD is low, indicating scores cluster closely around the mean

Goal: can we infer that results from the sample apply to the population?
Key concepts
- Sample vs Population: subset of individuals vs all individuals
- Correlation: the relationship between two variables
- Correlation coefficient: r, the numeric indicator of the strength and direction of the relationship
- Example context: in a psychology course, sample of DS 101 students vs population of all DS 101 students (worldwide)

Strength measure: absolute value of the correlation coefficient
- The closer |r| is to 1, the stronger the correlation
- The closer |r| is to 0, the weaker the correlation
Interpretation guidance: stronger correlations imply more consistent directional association, but not causation

p-value: the probability that the observed result (or something more extreme) would occur if the null hypothesis were true
Common threshold: p < 0.05 indicates statistical significance
- Interpretation: less than 5% chance the result is due to random variation under the null hypothesis
Non-significance: p \ge 0.05 indicates results could plausibly be due to chance

Key reasons
- No random assignment across conditions in most observational studies
- Extraneous variables (third variables) may drive the observed relationship
- Causation requires ruling out alternative explanations and establishing temporal precedence (often via experimental design)

Mean: \bar{x} = \frac{1}{n} \sum{i=1}^n xi
Standard deviation (sample): s = \sqrt{\frac{1}{n-1} \sum{i=1}^n (xi - \bar{x})^2}
Correlation coefficient: r = \frac{\sum{i=1}^n (xi - \bar{x})(yi - \bar{y})}{\sqrt{\sum{i=1}^n (xi - \bar{x})^2} \; \sqrt{\sum{i=1}^n (y_i - \bar{y})^2}}
p-value concept: p = P( \text{extreme data} \mid H_0 ) (probability of observing data as extreme as, or more extreme than, what was observed under the null hypothesis)

Central tendency and variability together give a full picture of a dataset
Understanding distribution shape (normal vs skewed) guides choice of descriptive statistics (mean vs median vs mode)
Inferential statistics connect sample observations to population inferences, tempered by concerns about causation vs correlation and potential confounds