Unit 7-Measures of Association: Correlation and Regression
Measures of Association
Correlation & Regression
Focus on understanding the relationships between variables through correlation and regression analyses.
Correlational Study Key Points
Criteria Variable (Dependent): The outcome or variable you are trying to predict.
Predictor Variable (Independent): The variable you manipulate or measure for prediction.
Scatterplot: Visual representation to examine the relationship between the two variables.
Intra-ocular Test: Evaluate your subjective impression of the scatterplot; does the relationship make sense psychologically, physiologically, or sociologically?
Correlation Analysis Overview
Describes:
Direction and strength of the relationship between two variables.
Consistency in paired (X, Y) scores.
Variability of Y scores with respect to X scores.
Regression Analysis:
Fit of scatterplot to the regression line (line of best fit).
Predictive accuracy from the analysis.
Relationship Types
Positive Relationship
Example: Grade Point Average (Y) vs. Lectures Attended (X)
Attending more lectures positively correlates with higher GPAs.
Negative Relationship
Example: Time Spent in Gym (X) vs. Body Mass (Y)
More time in the gym does not necessarily reduce body mass but indicates an inverse relationship.
Correlation Coefficients
r (Correlation Coefficient): Measures the strength and direction of linear relationships.
Ranges from -1 (perfect negative) to +1 (perfect positive).
Values Interpretation:
0.00 - 0.25: Very weak
0.25 - 0.50: Weak
0.50 - 0.75: Moderate
>0.75: Strong
A high r-value suggests a stronger predictive relationship but does not imply causation.
Pearson Product Moment Correlation (PPMC)
Defines strength of a linear relationship: High r indicates strong predictability.
Relies on the notion of co-variance between variables.
Coefficient of Determination
Measure of the proportion of variability in one variable explained by another (expressed as (r_{XY})^2).
Evaluates how much variation can be ascribed to the predictor variable.
Linear Regression Equation
General form: Y = a + bX
b is the slope of the line, corresponding to the covariance ratio of correlated variables.
a is the y-intercept, representing the value of Y when X = 0.
Regression Analysis Steps
Calculate descriptive statistics (mean, standard deviation) for X and Y.
Calculate Pearson's r to find the correlation coefficient.
Determine line of best fit (regression equation): Calculate slope b and intercept a.
Calculate the Standard Error of Estimate (SEE): SEE = s_Y imes ext{sqr}(1 - r^2)
With a hypothetical X value, predict Y using the regression equation; express predicted Y with the SEE.
Calculate the range of predicted values considering error.
Example: Correlation in Student Grades
Correlation between first-year (X) and second-year (Y) psychology course grades.
Based on sample scores: (e.g., X means: 50,56,52… Y means: 80,85,83…)
Derived regression equation can help predict second-year grades based on first-year grades.
Example linear regression output: Y = 36.08 + 0.88X
Limitations of PPMC
Data range influence: Only captures relationships in the studied range.
Nonlinear relationships may go undetected.
Extreme data points can skew results significantly.
Hypothesis Testing in Correlation
Statistical null hypothesis (H0): Presume no correlation between X and Y.
Alternative hypothesis (H1): Posit a significant relationship exists.
Degrees of freedom: Typically N - 2 for correlation significance tests.
Covid-19 Case Study Example
Examining correlation between country vaccination rates (predictor variable X) and 28-day new case counts (criterion variable Y).
Found statistically significant positive correlation (r = 0.214) suggesting higher vaccination rates are associated with increased case counts in studied countries.