Understanding Multiple Predictors and Variance in Dependent Variables

Predictors and Variance

  • This unit explores how multiple predictors relate to the variance in a dependent variable.
  • It covers statistics for understanding and proportioning variance.
  • Key statistics: tolerance statistic and semi-partial correlation.

Predictor Usefulness

  • Predictors are useful if:
    • Theoretically relevant to the outcome (Y).
    • Correlated with the outcome (Y).
    • Uniquely predictive of the outcome (Y).

Example: Predicting IQ (Y)

  • Dependent variable (Y): IQ
  • Potential predictors (X):
    • Age
    • Educational level
    • Crossword puzzle ability
Age
  • Theoretically relevant to IQ because vocabulary (a component of IQ) typically peaks in mid-adulthood.
  • Memory decline with age can affect vocabulary assessment.
Educational Level
  • Correlated with IQ, supported by empirical evidence.
  • Individuals with more educational experience tend to perform better on IQ tests.
Crossword Puzzle Ability
  • Uniquely predictive of IQ because it involves problem-solving and information recall.
  • Performance on crossword puzzles may indicate cognitive abilities related to IQ.

Ideal Scenario: Unique Explanations

  • Ideal: Predictors (X1, X2, X3) each explain unique portions of the variance in the outcome (Y) without overlap.
  • Example: Three predictors explaining 75% of the variance in IQ scores would be highly valuable.

Redundancy

  • Overlap among predictors leads to redundancy.
  • A redundant variable doesn't add unique explanatory value.
  • Example: If X2 is largely overlapped by X1 and X3, it is mostly redundant.

Types of Redundancy

  • Partly redundant: Some shared variance explained.
  • Wholly redundant: Entirely explained by other predictors.

Correlation

  • Independent variables (X's) are typically correlated with the dependent variable (Y).
  • Correlation implies shared variance in explaining Y.

R^2:

  • R^2 represents the proportion of variance in Y explained by the predictors; includes unique and joint contributions.
  • R^2 = (a + c + b) / (a + c + b + d), where:
    • a = unique variance explained by one predictor.
    • c = unique variance explained by another predictor.
    • b = overlapping variance explained by multiple predictors.
    • d = unexplained variance.
  • Example: If R^2 = 0.75, the predictors explain 75% of the variance in Y.

Multi-Collinearity

  • Multi-collinearity occurs when predictors are highly correlated, which reduces the ability to predict the outcome and violates assumptions.
  • Two reasons to avoid high correlation among predictors:
    1. Power is maximized when predictors explain unique components of the outcome.
    2. Violates the assumption of non-multi-collinearity.

Identifying Multi-Collinearity

  • Common indicator: correlation (r) > 0.9 between predictors.
  • Tolerance is another measure used to assess multi-collinearity.

Assumptions

  • Correlation assumptions.
  • Simple regression assumptions.
  • Multiple regression assumption: no highly redundant information (non-multi-collinearity).

Example: Animal Listing Task

  • Three students (Anthony, Louise, Joanna) list animals in 15 seconds without hearing each other.

  • Total: 21 animals listed, 13 unique.

  • Anthony: two unique animals.

  • Louise: three unique animals.

  • Joanna: no unique animals.

  • Outcome:

    • Joanna lists the most animals overall.
    • Anthony and Louise provide the most unique animals (13).
  • Conclusion: Combining Anthony and Louise yields more unique information than Joanna alone.

Example: Employee Skill Sets

  • Anthony, Louise, and Joanna have different skill sets (typing, filing, communication, software proficiency).

  • If hiring two people, Anthony and Louise would be preferred for their unique skills.

  • If hiring one person, Joanna might be chosen due to the total skill set, but Anthony and Louise have unique skill sets that provide a better outcome.

Overlap in Predictors

  • As the number of predictors increases, the unique contribution of each predictor tends to decrease.
  • More predictors can