Multivariate Correlational Designs Notes

Multivariate Correlational Designs

Correlational Studies

Correlational studies examine the linear relationship between two variables.
A statistically significant correlation indicates that two variables covary, meaning they change together.
Further research is needed to understand why the two variables covary.

Establishing Causality

Experimental Design:
- Manipulate an independent variable (IV) and observe the effect on a dependent variable (DV).
- Covariation is indicated if an effect is observed.
- Directionality is established because the IV is manipulated before the DV is measured.
- Good experiments involve careful control to eliminate extraneous variables.
Criteria for Establishing Causality:
- Covariation: Variables must be correlated (associated).
- Directionality (Temporal Precedence): The cause must precede the effect.
- Internal Validity: Elimination of extraneous variables.

Example: Smoking and Lung Cancer

Cigarette use correlates with higher rates of lung cancer.
Strong, positive, and consistent effects are seen across numerous studies.
Meta-analysis results in a correlation coefficient of $r = .49$ , with a 95% confidence interval (CI) of $.39$ to $.58$ .
Directionality: Smoking exposure must precede the onset of lung cancer.
- The majority of patients report cigarette use at the time of diagnosis.
- Longitudinal/prospective studies support this directionality.
Internal Validity: Potential third variables are considered.
- Studies control for factors like age, other drug use, pollution exposure, inherited conditions, diet, and exercise.
- The correlation between smoking and lung cancer persists even with these factors controlled.
Conclusion: Smoking causes lung cancer.

Longitudinal Studies

Follow the same sample of people and observe/survey/measure two key variables across time (i.e., more than once).

Types of Correlation in Longitudinal Studies

Cross-sectional correlation: Association between two variables measured at the same time.
Autocorrelation: Association of a variable with itself, measured at two different times.
Cross-lag correlation: Association between an earlier measure of one variable and a later measure of another variable, which is critical for establishing directionality.

Example: TV Violence and Aggressive Behavior (Eron et al., 1972)

In a study of TV violence and aggressive behavior, TV violence and aggression were measured in both 3rd grade and "13th grade" (young adulthood).
Examples of correlations found include:
- Cross-sectional correlation: The correlation between aggression in the 3rd grade and aggression in the 13th grade.
- Autocorrelation: The correlation between TV violence in the 3rd grade and TV violence in the 13th grade.
- Cross-lag correlation: The correlation between TV violence in the 3rd grade and aggression in the 13th grade was $r = .31$ , while the correlation between aggression in the 3rd grade and TV violence in the 13th grade was $r = .05$ .

Multiple Regression

Key idea: If two variables are correlated, one variable can be used to predict the other variable.

Linear Regression

Statistical technique for finding the straight line that best fits a set of data.
The regression line describes the value of Y that is most likely for each possible value of X, based on the sample data.
Criterion variable $= b$ (Predictor variable) + constant
- Criterion variable $=$ dependent variable
- Predictor variable $=$ independent variable
β (Beta):
- The sign of beta matches the direction of the correlation.
- If the 95% CI for beta does not include zero, then one variable significantly predicts the other.
Equation of a straight line:
- $y = mx + b$
- Transformed to: $y = bX + a$
- Or, $y = a + bX$
Multiple Regression Equation:
- $y = a + b1X1 + b2X2 + b3X3 + b4X4 + … + biXi$
Linear Regression Equation with Predicted Value:
- $Y{predicted} = Ŷ = a + b{yx}X$
 - Where:
 - $Y_{predicted} =$ the predicted value of y (the DV)
 - $x =$ a given value of the IV
 - $a =$ the ”y-intercept”, the value when $x = 0$ (the “regression constant”)
 - $b_{yx} =$ slope of the regression line (the “regression coefficient”)
$b$ is in the units of $Y$ .
- ”A 1-unit increase in $X$ predicts a $b$ -unit increase in Y$”
β (Beta):
- β is a “standardized” version of b $, transforming$ b $to be in “standard deviation units.”</li> <li>“A 1-standard-deviation increase in$ X $predicts a β-standard-deviation increase in$ Y$.”
- βs (standardized coefficients) can be directly compared (within the same analysis) to interpret the relative influence of predictor variables – bs (unstandardized coef’s) cannot.

Least Squares Criterion

The sum of the distance from all points in a scatter plot from the prediction line (regression line) is the least possible.
These deviations are called residuals.
Ordinary Least Squares regression is used to compute a regression line that minimizes the sum of squared residuals, finding values of ” $b$ ” & ” $a$ ” that creates a line that minimizes the residuals.

Correlation vs. Regression

Correlation:
- Covariance – Do values above the mean in “ $X$ ” generally correspond with “ $Y$ ” values that are above the mean?
- Formula:
 - $r = \frac{Cov{xy}}{sx * s_y}$
 - $Covariance = \frac{\Sigma[(X - Mx)(Y - My)]}{n}$
Regression:
- Involves fitting a line that minimizes the distance between the line and the individual data points.
- After making a line that minimizes the residuals, does that line have a slope that is different than zero?
- Example:
  - $\hat{Y} = a + b_{yx}X$
  - $\hat{Y} = 0.91 + 0.21*X$
  - (Predicted GPA) = 0.91 + 0.21*(High School Math Grades)

Multiple Regression

Involves developing a linear regression model for predicting a criterion variable using 2+ predictors.
Example: Predicting college GPA from…

Interpreting Regression Results

Example Equation:
- $GPA_{predicted} = 0.60 + 0.17(Math) + 0.03(Science) + 0.05(English) + (-.01)(Sex)$
Controlling for the effect of Science grades, English grades, and Sex, Math grades have a significant positive relationship with College GPA: $b = 0.17$ , $β = 0.35$ , $t = 4.74$ , p < .001.
Interpretation: Controlling for the effect of the covariates, a 1-unit increase in High School Math Grades predicts a 0.17 increase in College GPA.

Evaluating Causal Criteria

Covariation: Met, based on significant correlation.
Directionality (Temporal Precedence): Met, as high school grades precede college GPA.
Internal Validity: Partially met, as the effect of HS math is significant after controlling for HS Science, English, and sex.
Overall: There is some support for a causal relationship between high school math and college GPA.
High School Science Grades
- Covariation: Yes
- Directionality (Temporal Precedence): Yes – high school happens before college
- Internal Validity: No –the effect of HS science is not significant after controlling for HS math, English, and sex