Multivariate Correlational Designs Notes

Multivariate Correlational Designs

Correlational Studies

  • Correlational studies examine the linear relationship between two variables.
  • A statistically significant correlation indicates that two variables covary, meaning they change together.
  • Further research is needed to understand why the two variables covary.

Establishing Causality

  • Experimental Design:
    • Manipulate an independent variable (IV) and observe the effect on a dependent variable (DV).
    • Covariation is indicated if an effect is observed.
    • Directionality is established because the IV is manipulated before the DV is measured.
    • Good experiments involve careful control to eliminate extraneous variables.
  • Criteria for Establishing Causality:
    • Covariation: Variables must be correlated (associated).
    • Directionality (Temporal Precedence): The cause must precede the effect.
    • Internal Validity: Elimination of extraneous variables.
Example: Smoking and Lung Cancer
  • Cigarette use correlates with higher rates of lung cancer.
  • Strong, positive, and consistent effects are seen across numerous studies.
  • Meta-analysis results in a correlation coefficient of r=.49r = .49, with a 95% confidence interval (CI) of .39.39 to .58.58.
  • Directionality: Smoking exposure must precede the onset of lung cancer.
    • The majority of patients report cigarette use at the time of diagnosis.
    • Longitudinal/prospective studies support this directionality.
  • Internal Validity: Potential third variables are considered.
    • Studies control for factors like age, other drug use, pollution exposure, inherited conditions, diet, and exercise.
    • The correlation between smoking and lung cancer persists even with these factors controlled.
  • Conclusion: Smoking causes lung cancer.

Longitudinal Studies

  • Follow the same sample of people and observe/survey/measure two key variables across time (i.e., more than once).
Types of Correlation in Longitudinal Studies
  • Cross-sectional correlation: Association between two variables measured at the same time.
  • Autocorrelation: Association of a variable with itself, measured at two different times.
  • Cross-lag correlation: Association between an earlier measure of one variable and a later measure of another variable, which is critical for establishing directionality.
Example: TV Violence and Aggressive Behavior (Eron et al., 1972)
  • In a study of TV violence and aggressive behavior, TV violence and aggression were measured in both 3rd grade and "13th grade" (young adulthood).

  • Examples of correlations found include:

    • Cross-sectional correlation: The correlation between aggression in the 3rd grade and aggression in the 13th grade.
    • Autocorrelation: The correlation between TV violence in the 3rd grade and TV violence in the 13th grade.
    • Cross-lag correlation: The correlation between TV violence in the 3rd grade and aggression in the 13th grade was r=.31r = .31, while the correlation between aggression in the 3rd grade and TV violence in the 13th grade was r=.05r = .05.

Multiple Regression

  • Key idea: If two variables are correlated, one variable can be used to predict the other variable.
Linear Regression
  • Statistical technique for finding the straight line that best fits a set of data.

  • The regression line describes the value of Y that is most likely for each possible value of X, based on the sample data.

  • Criterion variable =b= b (Predictor variable) + constant

    • Criterion variable == dependent variable
    • Predictor variable == independent variable
  • β (Beta):

    • The sign of beta matches the direction of the correlation.
    • If the 95% CI for beta does not include zero, then one variable significantly predicts the other.
  • Equation of a straight line:

    • y=mx+by = mx + b
    • Transformed to: y=bX+ay = bX + a
    • Or, y=a+bXy = a + bX
  • Multiple Regression Equation:

    • y=a+b<em>1X</em>1+b<em>2X</em>2+b<em>3X</em>3+b<em>4X</em>4++b<em>iX</em>iy = a + b<em>1X</em>1 + b<em>2X</em>2 + b<em>3X</em>3 + b<em>4X</em>4 + … + b<em>iX</em>i
  • Linear Regression Equation with Predicted Value:

    • Y<em>predicted=Y^=a+b</em>yxXY<em>{predicted} = Ŷ = a + b</em>{yx}X
      • Where:
        • Ypredicted=Y_{predicted} = the predicted value of y (the DV)
        • x=x = a given value of the IV
        • a=a = the ”y-intercept”, the value when x=0x = 0 (the “regression constant”)
        • byx=b_{yx} = slope of the regression line (the “regression coefficient”)
  • bb is in the units of YY.

    • ”A 1-unit increase in XX predicts a bb-unit increase in Y$”
  • β (Beta):

    • β is a “standardized” version of b,transforming, transformingbtobeinstandarddeviationunits.</p></li><li><p>A1standarddeviationincreaseinto be in “standard deviation units.”</p></li> <li><p>“A 1-standard-deviation increase inXpredictsaβstandarddeviationincreaseinpredicts a β-standard-deviation increase inY$.”

    • βs (standardized coefficients) can be directly compared (within the same analysis) to interpret the relative influence of predictor variables – bs (unstandardized coef’s) cannot.

Least Squares Criterion
  • The sum of the distance from all points in a scatter plot from the prediction line (regression line) is the least possible.
  • These deviations are called residuals.
  • Ordinary Least Squares regression is used to compute a regression line that minimizes the sum of squared residuals, finding values of ”bb” & ”aa” that creates a line that minimizes the residuals.

Correlation vs. Regression

  • Correlation:
    • Covariance – Do values above the mean in “XX” generally correspond with “YY” values that are above the mean?
    • Formula:
      • r=Cov<em>xys</em>xsyr = \frac{Cov<em>{xy}}{s</em>x * s_y}
      • Covariance=Σ[(XM<em>x)(YM</em>y)]nCovariance = \frac{\Sigma[(X - M<em>x)(Y - M</em>y)]}{n}
  • Regression:
    • Involves fitting a line that minimizes the distance between the line and the individual data points.
    • After making a line that minimizes the residuals, does that line have a slope that is different than zero?
    • Example:
      • Y^=a+byxX\hat{Y} = a + b_{yx}X
      • Y^=0.91+0.21X\hat{Y} = 0.91 + 0.21*X
      • (Predicted GPA) = 0.91 + 0.21*(High School Math Grades)

Multiple Regression

  • Involves developing a linear regression model for predicting a criterion variable using 2+ predictors.
  • Example: Predicting college GPA from…
Interpreting Regression Results
  • Example Equation:
    • GPApredicted=0.60+0.17(Math)+0.03(Science)+0.05(English)+(.01)(Sex)GPA_{predicted} = 0.60 + 0.17(Math) + 0.03(Science) + 0.05(English) + (-.01)(Sex)
  • Controlling for the effect of Science grades, English grades, and Sex, Math grades have a significant positive relationship with College GPA: b=0.17b = 0.17, β=0.35β = 0.35, t=4.74t = 4.74, p < .001.
  • Interpretation: Controlling for the effect of the covariates, a 1-unit increase in High School Math Grades predicts a 0.17 increase in College GPA.
Evaluating Causal Criteria
  • Covariation: Met, based on significant correlation.
  • Directionality (Temporal Precedence): Met, as high school grades precede college GPA.
  • Internal Validity: Partially met, as the effect of HS math is significant after controlling for HS Science, English, and sex.
  • Overall: There is some support for a causal relationship between high school math and college GPA.
  • High School Science Grades
    • Covariation: Yes
    • Directionality (Temporal Precedence): Yes – high school happens before college
    • Internal Validity: No –the effect of HS science is not significant after controlling for HS math, English, and sex