Understanding Multiple Predictors and Variance in Dependent Variables

Predictors and Variance

This unit explores how multiple predictors relate to the variance in a dependent variable.
It covers statistics for understanding and proportioning variance.
Key statistics: tolerance statistic and semi-partial correlation.

Predictor Usefulness

Predictors are useful if:
- Theoretically relevant to the outcome (Y).
- Correlated with the outcome (Y).
- Uniquely predictive of the outcome (Y).

Example: Predicting IQ (Y)

Dependent variable (Y): IQ
Potential predictors (X):
- Age
- Educational level
- Crossword puzzle ability

Age

Theoretically relevant to IQ because vocabulary (a component of IQ) typically peaks in mid-adulthood.
Memory decline with age can affect vocabulary assessment.

Educational Level

Correlated with IQ, supported by empirical evidence.
Individuals with more educational experience tend to perform better on IQ tests.

Crossword Puzzle Ability

Uniquely predictive of IQ because it involves problem-solving and information recall.
Performance on crossword puzzles may indicate cognitive abilities related to IQ.

Ideal Scenario: Unique Explanations

Ideal: Predictors (X1, X2, X3) each explain unique portions of the variance in the outcome (Y) without overlap.
Example: Three predictors explaining 75% of the variance in IQ scores would be highly valuable.

Redundancy

Overlap among predictors leads to redundancy.
A redundant variable doesn't add unique explanatory value.
Example: If X2 is largely overlapped by X1 and X3, it is mostly redundant.

Types of Redundancy

Partly redundant: Some shared variance explained.
Wholly redundant: Entirely explained by other predictors.

Correlation

Independent variables (X's) are typically correlated with the dependent variable (Y).
Correlation implies shared variance in explaining Y.

R^2:

R^2 represents the proportion of variance in Y explained by the predictors; includes unique and joint contributions.
R^2 = (a + c + b) / (a + c + b + d), where:
- a = unique variance explained by one predictor.
- c = unique variance explained by another predictor.
- b = overlapping variance explained by multiple predictors.
- d = unexplained variance.
Example: If R^2 = 0.75, the predictors explain 75% of the variance in Y.

Multi-Collinearity

Multi-collinearity occurs when predictors are highly correlated, which reduces the ability to predict the outcome and violates assumptions.
Two reasons to avoid high correlation among predictors:
1. Power is maximized when predictors explain unique components of the outcome.
2. Violates the assumption of non-multi-collinearity.

Identifying Multi-Collinearity

Common indicator: correlation (r) > 0.9 between predictors.
Tolerance is another measure used to assess multi-collinearity.

Assumptions

Correlation assumptions.
Simple regression assumptions.
Multiple regression assumption: no highly redundant information (non-multi-collinearity).

Example: Animal Listing Task

Three students (Anthony, Louise, Joanna) list animals in 15 seconds without hearing each other.
Total: 21 animals listed, 13 unique.
Anthony: two unique animals.
Louise: three unique animals.
Joanna: no unique animals.
Outcome:
- Joanna lists the most animals overall.
- Anthony and Louise provide the most unique animals (13).
Conclusion: Combining Anthony and Louise yields more unique information than Joanna alone.

Example: Employee Skill Sets

Anthony, Louise, and Joanna have different skill sets (typing, filing, communication, software proficiency).
If hiring two people, Anthony and Louise would be preferred for their unique skills.
If hiring one person, Joanna might be chosen due to the total skill set, but Anthony and Louise have unique skill sets that provide a better outcome.

Overlap in Predictors

As the number of predictors increases, the unique contribution of each predictor tends to decrease.
More predictors can