Notes on Psychology Statistics (Descriptive and Inferential)
Descriptive Statistics
- Purpose: describe, organize, and summarize data
- Frequency distribution
- Definition: how often each data point appears
- Visuals: frequency (y-axis) vs data points (x-axis)
- Example shown: x-axis could be number of pets someone has
Measures of Central Tendency
- Definitions
- Mean (average): \bar{x} = \frac{1}{n} \sum{i=1}^n xi
- Median (midpoint of an ordered set)
- Mode (most common value)
- Decision rules
- Choice depends on data distribution: normal/symmetric (bell-shaped) vs skewed
- Skewness: named for its tail direction
- Skew and its implications
- If the tail extends to the left (low values), the distribution is negatively skewed
- If the tail extends to the right (high values), the distribution is positively skewed
- When data are skewed, the mean is pulled toward the tail
- In skewed distributions, the mode or median can be better descriptors of the typical value than the mean
Measures of Variability
- Purpose: describe how far scores are spread from the mean
- Standard deviation (SD)
- Definition: a numeric indicator of spread around the mean
- Population vs sample nuance: SD is higher when scores are far from the mean; lower when scores cluster near the mean
- Interpretation:
- Far spread → high SD
- Close spread → low SD
- Relationship to central tendency
- Measures of variability accompany measures of central tendency
- Two samples can have the same mean but different SDs, indicating different data shapes or consistency
- Example excerpt (illustrative): a set like 1, 2, 4, 5, 3, 4, 2 has mean \bar{x}=3, but SD would reveal spread around that mean
Example: Describing a Sample’s Happiness
- Question: On average, how happy is the sample?
- Reported values: mean = 2.47 (sample is moderately happy)
- Interpretation of spread: SD is low, indicating scores cluster closely around the mean
Inferential Statistics
- Goal: can we infer that results from the sample apply to the population?
- Key concepts
- Sample vs Population: subset of individuals vs all individuals
- Correlation: the relationship between two variables
- Correlation coefficient: r, the numeric indicator of the strength and direction of the relationship
- Example context: in a psychology course, sample of DS 101 students vs population of all DS 101 students (worldwide)
Direction of Correlation
- Positive correlation
- Definition: as one variable increases, the other also increases (or both decrease together)
- Real-world examples:
- Height and weight
- GRE scores and graduate school success
- High school GPA and university GPA
Negative (Inverse) Correlation
- Negative correlation
- Definition: as one variable increases, the other decreases
- Real-world examples:
- Smoking and health
- Flossing and tooth decay
- Absences and exam scores
Strength of Correlation
- Strength measure: absolute value of the correlation coefficient
- The closer |r| is to 1, the stronger the correlation
- The closer |r| is to 0, the weaker the correlation
- Interpretation guidance: stronger correlations imply more consistent directional association, but not causation
p-value and Statistical Significance
- p-value: the probability that the observed result (or something more extreme) would occur if the null hypothesis were true
- Common threshold: p < 0.05 indicates statistical significance
- Interpretation: less than 5% chance the result is due to random variation under the null hypothesis
- Non-significance: p \ge 0.05 indicates results could plausibly be due to chance
Why Correlation is Not Causation
- Key reasons
- No random assignment across conditions in most observational studies
- Extraneous variables (third variables) may drive the observed relationship
- Causation requires ruling out alternative explanations and establishing temporal precedence (often via experimental design)
- Mean: \bar{x} = \frac{1}{n} \sum{i=1}^n xi
- Standard deviation (sample): s = \sqrt{\frac{1}{n-1} \sum{i=1}^n (xi - \bar{x})^2}
- Correlation coefficient: r = \frac{\sum{i=1}^n (xi - \bar{x})(yi - \bar{y})}{\sqrt{\sum{i=1}^n (xi - \bar{x})^2} \; \sqrt{\sum{i=1}^n (y_i - \bar{y})^2}}
- p-value concept: p = P( \text{extreme data} \mid H_0 ) (probability of observing data as extreme as, or more extreme than, what was observed under the null hypothesis)
Connections to Broader Principles
- Central tendency and variability together give a full picture of a dataset
- Understanding distribution shape (normal vs skewed) guides choice of descriptive statistics (mean vs median vs mode)
- Inferential statistics connect sample observations to population inferences, tempered by concerns about causation vs correlation and potential confounds