Advanced Research Methods Lecture Flashcards

Exploratory Factor Analysis (EFA)

Purpose: To describe patterns of correlations in a dataset and identify variables that "go together" to measure underlying psychological traits (constructs).
Core Concepts:
- Vectors: Variables can be visualized as lines in a space; the angle between them represents their correlation.
- Eigenproblem: Mathematically finding weights ( $b$ ) to produce a vector (scale score) with the largest possible variance.
- Eigenvector: A set of weights (loadings) indicating an item's importance to a factor.
- Eigenvalue: The amount of variance captured by a factor. Total variance in a set is equal to the number of variables (standardized).
Assessment of Factorability:
- Internal correlations should reach > 0.3.
- Kaiser-Meyer-Olkin (KMO): Measures the proportion of variance shared among variables. Should be at least $0.6$ .
- Bartlett's Test of Sphericity: Tests the null hypothesis that the correlation matrix is an identity matrix. Needs to be significant (p < 0.05).
- Anti-image Correlation Matrix: Diagonals (Measure of Sampling Adequacy) should be > 0.5; off-diagonals (partial correlations) should be low.
Determining the Number of Factors:
1. Kaiser’s Criterion: Retain factors with Eigenvalues > 1.
2. Scree Plot: Look for the "elbow" where eigenvalues drop off significantly.
3. Parallel Analysis Test (PAT): A monte-carlo simulation comparing the data's eigenvalues against those generated from random data. If the real eigenvalue is greater than the 95th percentile of random samples, it is considered significant.
Extraction Methods:
- Principal Components Analysis (PCA): Used to summarize or describe patterns of association; analyzes total variability.
- Common Factor Analysis (e.g., Principal Axis Factoring - PAF): Used to identify latent traits; analyzes only shared variance (communalities).
Rotation:
- Orthogonal (Varimax): Forces factor vectors to remain at right angles ( $90^{\circ}$ ); assumes factors are uncorrelated.
- Oblique (Direct Oblimin): Allows factor vectors to be correlated. If factor correlations are low (< 0.3), the simpler orthogonal solution is often preferred.
Interpretation and Refinement:
- Simple Structure: Each factor should have several high loadings and several low loadings. Each variable should ideally load on only one factor.
- Loadings Cutoff: Usually set at $0.4$ for practical significance.
- Communalities: If a variable has low extraction communality, it isn't well-explained by the factor structure.

Multiple Linear Regression (MLR)

Concepts of Covariation: MLR examines how variables vary together. The correlation coefficient $r$ is standardized; $r^2$ represents shared variance.
Standardized ( $\beta$ ) vs. Unstandardized ( $b$ ) Coefficients:
- $b$ : Change in the criterion per 1-unit change in the predictor in original units.
- $\beta$ : Change in the criterion in Standard Deviation units per 1 SD change in the predictor. In simple regression, $\beta = r$ .
MLR Equation: $Z_{\hat{Y}} = \beta_1 Z_{X_1} + \beta_2 Z_{X_2} + ... + \beta_k Z_{X_k}$
Model Significance:
- Multivariate R: Correlation between observed $Y$ and predicted $\hat{Y}$ .
- $R^2$ : Proportion of total variance in the DV explained by the set of IVs.
- Adjusted $R^2$ : Corrects for sample size and number of predictors to estimate population variance.
- F-test: Tests whether $R^2$ is significantly greater than zero.
Unique Contribution:
- Squared Semi-partial Correlation ( $sr^2$ ): The amount of variance in the DV uniquely explained by a single IV after controlling for all other IVs in the model.
Hierarchical Models: Entering variables in "Blocks" to see if a second set explains significantly more variance than the first. Checked via the " $R^2$ Change" and "F Change" statistics.

Regression Diagnostics and Robustness

Linearity: Visual check of residual scatterplots (ZPRED vs. ZRESID). Look for lack of curved shapes.
Homoscedasticity: Variance of residuals should be consistent across all levels of predicted values. Violation looks like a "fan" or "butterfly" shape.
Normality of Residuals: Residuals ( $Y - \hat{Y}$ ) should be normally distributed. Check via histograms/P-P plots.
Outliers:
1. SDR (Studentized Deleted Residuals): High residual values on the Y-axis. Calculate p-value via t-distribution with $\text{df} = N - p - 1$ .
2. Leverage (Mahalanobis Distance): Unusual combinations of predictor scores. Distributes as $\chi^2$ with $\text{df} = p$ .
3. Influence (Cook's Distance): Impact of a case on the total solution. Generally, values > 1 are concerning.
Multicollinearity and Singularity:
- Tolerance: $1 - R^2$ (for predicting that IV from all other IVs). Low values mean high overlap.
- VIF (Variance Inflation Factor): Reciprocal of tolerance. High values (> 10) indicate problematic inflation of standard errors.
- Singularity: Perfect correlation ( $r = 1.0$ ), often caused by using subscales and totals in the same model.
Suppressor Variables: A variable that is uncorrelated with the DV but correlates with another IV, removing irrelevant variance (noise) and making the first IV a stronger predictor. Signs include a $\beta$ higher than the bivariate $r$ , or a change in the direction of a relationship when the suppressor is added.
Bootstrapping: A non-parametric reshuffling method to estimate standard errors and confidence intervals. It does not assume normality of residuals. If the 95% confidence interval for a $\beta$ does not span zero, it is significant.
Missing Data:
- MCAR (Missing Completely at Random): No pattern; safe to use listwise deletion.
- MAR (Missing at Random): Missingness depends on other IVs but not the DV.
- MNAR (Missing Not at Random): Missingness depends on the DV itself; creates bias.
- Little’s MCAR Test: Significant results (p < 0.05) indicate data are NOT missing completely at random.

Mediation and Moderation

Mediation ( $X \rightarrow M \rightarrow Y$ ): Testing if an intervening variable (M) explains the mechanism behind the relationship between X and Y.
- Baron & Kenny Steps:
  1. X predicts Y (path c).
  2. X predicts M (path a).
  3. M predicts Y, controlling for X (path b).
  4. X no longer predicts Y significantly (path c’).
- Sobel Test: Specifically tests whether the "indirect effect" ( $a \times b$ ) is significantly greater than zero.
- Full vs. Partial: Partial mediation occurs when X still predicts Y significantly, but the effect size is reduced after adding M.
Moderation: Testing if the relationship between X and Y depends on the level of a third variable (Z). It is an interaction effect.
- Equation: $\hat{Y} = a + b_1X + b_2Z + b_3(X \times Z)$ .
- Centering: To avoid multicollinearity between main effects and the product term, subtract the mean from IVs before multiplying them.
- Simple Slopes: Finding the effect of the IV at high ( $+1 \text{ SD}$ ) and low ( $-1 \text{ SD}$ ) levels of the moderator.

Categorical Variables and Logistic Regression

Dummy Coding: Representing categorical variables in regression using binary (0/1) codes. A variable with $k$ categories requires $k-1$ dummy variables.
ANCOVA (Analysis of Covariance): Predicting a continuous DV from categorical IVs while controlling for a continuous covariate (reducing error variance).
MANOVA: Predicting multiple continuous DVs from categorical IVs. Uses Wilks’ Lambda ( $\Lambda$ ) for multivariate significance testing.
Logistic Regression: Used when the DV is categorical (Dichotomous/Binomial or Multicategorical/Multinomial).
- Odds: $\frac{P(\text{event occurring})}{P(\text{event not occurring})}$ .
- Odds Ratio (Exp(B)): The factor by which the odds change for a 1-unit increase in the predictor. Values > 1 increase odds; values < 1 decrease odds (use reciprocal $1/\text{Exp}(B)$ for easier interpretation).
- Logit: The natural log of the odds ( $\ln(\text{odds})$ ), which is linearly related to the predictors.
- Assumptions: Independence of observations, mutually exclusive categories, large sample size. No normality/homoscedasticity assumption for predictors.
- Classification Accuracy:
  - Sensitivity: Hit rate for the target category.
  - Specificity: Correct rejection rate for the reference category.
ROC Analysis (Receiver Operating Characteristic): A curve plotting Sensitivity against $1 - \text{Specificity}$ for every possible cutoff point. The "Area Under the Curve" (AUC) indicates overall model power; researchers use the curve to find the optimal cutoff for their specific needs (balancing false positives vs. false negatives).