Unit 7-Measures of Association: Correlation and Regression
Measures of Association
Correlation & Regression
- Focus on understanding the relationships between variables through correlation and regression analyses.
Correlational Study Key Points
- Criteria Variable (Dependent): The outcome or variable you are trying to predict.
- Predictor Variable (Independent): The variable you manipulate or measure for prediction.
- Scatterplot: Visual representation to examine the relationship between the two variables.
- Intra-ocular Test: Evaluate your subjective impression of the scatterplot; does the relationship make sense psychologically, physiologically, or sociologically?
Correlation Analysis Overview
- Describes:
- Direction and strength of the relationship between two variables.
- Consistency in paired (X, Y) scores.
- Variability of Y scores with respect to X scores.
- Regression Analysis:
- Fit of scatterplot to the regression line (line of best fit).
- Predictive accuracy from the analysis.
Relationship Types
Positive Relationship
- Example: Grade Point Average (Y) vs. Lectures Attended (X)
- Attending more lectures positively correlates with higher GPAs.
Negative Relationship
- Example: Time Spent in Gym (X) vs. Body Mass (Y)
- More time in the gym does not necessarily reduce body mass but indicates an inverse relationship.
Correlation Coefficients
- r (Correlation Coefficient): Measures the strength and direction of linear relationships.
- Ranges from -1 (perfect negative) to +1 (perfect positive).
- Values Interpretation:
- 0.00 - 0.25: Very weak
- 0.25 - 0.50: Weak
- 0.50 - 0.75: Moderate
- >0.75: Strong
- A high r-value suggests a stronger predictive relationship but does not imply causation.
Pearson Product Moment Correlation (PPMC)
- Defines strength of a linear relationship: High r indicates strong predictability.
- Relies on the notion of co-variance between variables.
Coefficient of Determination
- Measure of the proportion of variability in one variable explained by another (expressed as (r_{XY})^2).
- Evaluates how much variation can be ascribed to the predictor variable.
Linear Regression Equation
- General form: Y = a + bX
- b is the slope of the line, corresponding to the covariance ratio of correlated variables.
- a is the y-intercept, representing the value of Y when X = 0.
Regression Analysis Steps
- Calculate descriptive statistics (mean, standard deviation) for X and Y.
- Calculate Pearson's r to find the correlation coefficient.
- Determine line of best fit (regression equation): Calculate slope b and intercept a.
- Calculate the Standard Error of Estimate (SEE): SEE = s_Y imes ext{sqr}(1 - r^2)
- With a hypothetical X value, predict Y using the regression equation; express predicted Y with the SEE.
- Calculate the range of predicted values considering error.
Example: Correlation in Student Grades
- Correlation between first-year (X) and second-year (Y) psychology course grades.
- Based on sample scores: (e.g., X means: 50,56,52… Y means: 80,85,83…)
- Derived regression equation can help predict second-year grades based on first-year grades.
- Example linear regression output: Y = 36.08 + 0.88X
Limitations of PPMC
- Data range influence: Only captures relationships in the studied range.
- Nonlinear relationships may go undetected.
- Extreme data points can skew results significantly.
Hypothesis Testing in Correlation
- Statistical null hypothesis (H0): Presume no correlation between X and Y.
- Alternative hypothesis (H1): Posit a significant relationship exists.
- Degrees of freedom: Typically N - 2 for correlation significance tests.
Covid-19 Case Study Example
- Examining correlation between country vaccination rates (predictor variable X) and 28-day new case counts (criterion variable Y).
- Found statistically significant positive correlation (r = 0.214) suggesting higher vaccination rates are associated with increased case counts in studied countries.