Advanced Research Methods Lecture Flashcards

Exploratory Factor Analysis (EFA)

  • Purpose: To describe patterns of correlations in a dataset and identify variables that "go together" to measure underlying psychological traits (constructs).

  • Core Concepts:

    • Vectors: Variables can be visualized as lines in a space; the angle between them represents their correlation.

    • Eigenproblem: Mathematically finding weights (bb) to produce a vector (scale score) with the largest possible variance.

    • Eigenvector: A set of weights (loadings) indicating an item's importance to a factor.

    • Eigenvalue: The amount of variance captured by a factor. Total variance in a set is equal to the number of variables (standardized).

  • Assessment of Factorability:

    • Internal correlations should reach > 0.3.

    • Kaiser-Meyer-Olkin (KMO): Measures the proportion of variance shared among variables. Should be at least 0.60.6.

    • Bartlett's Test of Sphericity: Tests the null hypothesis that the correlation matrix is an identity matrix. Needs to be significant (p < 0.05).

    • Anti-image Correlation Matrix: Diagonals (Measure of Sampling Adequacy) should be > 0.5; off-diagonals (partial correlations) should be low.

  • Determining the Number of Factors:

    1. Kaiser’s Criterion: Retain factors with Eigenvalues > 1.

    2. Scree Plot: Look for the "elbow" where eigenvalues drop off significantly.

    3. Parallel Analysis Test (PAT): A monte-carlo simulation comparing the data's eigenvalues against those generated from random data. If the real eigenvalue is greater than the 95th percentile of random samples, it is considered significant.

  • Extraction Methods:

    • Principal Components Analysis (PCA): Used to summarize or describe patterns of association; analyzes total variability.

    • Common Factor Analysis (e.g., Principal Axis Factoring - PAF): Used to identify latent traits; analyzes only shared variance (communalities).

  • Rotation:

    • Orthogonal (Varimax): Forces factor vectors to remain at right angles (9090^{\circ}); assumes factors are uncorrelated.

    • Oblique (Direct Oblimin): Allows factor vectors to be correlated. If factor correlations are low (< 0.3), the simpler orthogonal solution is often preferred.

  • Interpretation and Refinement:

    • Simple Structure: Each factor should have several high loadings and several low loadings. Each variable should ideally load on only one factor.

    • Loadings Cutoff: Usually set at 0.40.4 for practical significance.

    • Communalities: If a variable has low extraction communality, it isn't well-explained by the factor structure.

Multiple Linear Regression (MLR)

  • Concepts of Covariation: MLR examines how variables vary together. The correlation coefficient rr is standardized; r2r^2 represents shared variance.

  • Standardized (β\beta) vs. Unstandardized (bb) Coefficients:

    • bb: Change in the criterion per 1-unit change in the predictor in original units.

    • β\beta: Change in the criterion in Standard Deviation units per 1 SD change in the predictor. In simple regression, β=r\beta = r.

  • MLR Equation:     ZY^=β1ZX1+β2ZX2+...+βkZXkZ_{\hat{Y}} = \beta_1 Z_{X_1} + \beta_2 Z_{X_2} + ... + \beta_k Z_{X_k}

  • Model Significance:

    • Multivariate R: Correlation between observed YY and predicted Y^\hat{Y}.

    • R2R^2: Proportion of total variance in the DV explained by the set of IVs.

    • Adjusted R2R^2: Corrects for sample size and number of predictors to estimate population variance.

    • F-test: Tests whether R2R^2 is significantly greater than zero.

  • Unique Contribution:

    • Squared Semi-partial Correlation (sr2sr^2): The amount of variance in the DV uniquely explained by a single IV after controlling for all other IVs in the model.

  • Hierarchical Models: Entering variables in "Blocks" to see if a second set explains significantly more variance than the first. Checked via the "R2R^2 Change" and "F Change" statistics.

Regression Diagnostics and Robustness

  • Linearity: Visual check of residual scatterplots (ZPRED vs. ZRESID). Look for lack of curved shapes.

  • Homoscedasticity: Variance of residuals should be consistent across all levels of predicted values. Violation looks like a "fan" or "butterfly" shape.

  • Normality of Residuals: Residuals (YY^Y - \hat{Y}) should be normally distributed. Check via histograms/P-P plots.

  • Outliers:

    1. SDR (Studentized Deleted Residuals): High residual values on the Y-axis. Calculate p-value via t-distribution with df=Np1\text{df} = N - p - 1.

    2. Leverage (Mahalanobis Distance): Unusual combinations of predictor scores. Distributes as χ2\chi^2 with df=p\text{df} = p.

    3. Influence (Cook's Distance): Impact of a case on the total solution. Generally, values > 1 are concerning.

  • Multicollinearity and Singularity:

    • Tolerance: 1R21 - R^2 (for predicting that IV from all other IVs). Low values mean high overlap.

    • VIF (Variance Inflation Factor): Reciprocal of tolerance. High values (> 10) indicate problematic inflation of standard errors.

    • Singularity: Perfect correlation (r=1.0r = 1.0), often caused by using subscales and totals in the same model.

  • Suppressor Variables: A variable that is uncorrelated with the DV but correlates with another IV, removing irrelevant variance (noise) and making the first IV a stronger predictor. Signs include a β\beta higher than the bivariate rr, or a change in the direction of a relationship when the suppressor is added.

  • Bootstrapping: A non-parametric reshuffling method to estimate standard errors and confidence intervals. It does not assume normality of residuals. If the 95% confidence interval for a β\beta does not span zero, it is significant.

  • Missing Data:

    • MCAR (Missing Completely at Random): No pattern; safe to use listwise deletion.

    • MAR (Missing at Random): Missingness depends on other IVs but not the DV.

    • MNAR (Missing Not at Random): Missingness depends on the DV itself; creates bias.

    • Little’s MCAR Test: Significant results (p < 0.05) indicate data are NOT missing completely at random.

Mediation and Moderation

  • Mediation (XMYX \rightarrow M \rightarrow Y): Testing if an intervening variable (M) explains the mechanism behind the relationship between X and Y.

    • Baron & Kenny Steps:

      1. X predicts Y (path c).

      2. X predicts M (path a).

      3. M predicts Y, controlling for X (path b).

      4. X no longer predicts Y significantly (path c’).

    • Sobel Test: Specifically tests whether the "indirect effect" (a×ba \times b) is significantly greater than zero.

    • Full vs. Partial: Partial mediation occurs when X still predicts Y significantly, but the effect size is reduced after adding M.

  • Moderation: Testing if the relationship between X and Y depends on the level of a third variable (Z). It is an interaction effect.

    • Equation: Y^=a+b1X+b2Z+b3(X×Z)\hat{Y} = a + b_1X + b_2Z + b_3(X \times Z).

    • Centering: To avoid multicollinearity between main effects and the product term, subtract the mean from IVs before multiplying them.

    • Simple Slopes: Finding the effect of the IV at high (+1 SD+1 \text{ SD}) and low (1 SD-1 \text{ SD}) levels of the moderator.

Categorical Variables and Logistic Regression

  • Dummy Coding: Representing categorical variables in regression using binary (0/1) codes. A variable with kk categories requires k1k-1 dummy variables.

  • ANCOVA (Analysis of Covariance): Predicting a continuous DV from categorical IVs while controlling for a continuous covariate (reducing error variance).

  • MANOVA: Predicting multiple continuous DVs from categorical IVs. Uses Wilks’ Lambda (Λ\Lambda) for multivariate significance testing.

  • Logistic Regression: Used when the DV is categorical (Dichotomous/Binomial or Multicategorical/Multinomial).

    • Odds: P(event occurring)P(event not occurring)\frac{P(\text{event occurring})}{P(\text{event not occurring})}.

    • Odds Ratio (Exp(B)): The factor by which the odds change for a 1-unit increase in the predictor. Values > 1 increase odds; values < 1 decrease odds (use reciprocal 1/Exp(B)1/\text{Exp}(B) for easier interpretation).

    • Logit: The natural log of the odds (ln(odds)\ln(\text{odds})), which is linearly related to the predictors.

    • Assumptions: Independence of observations, mutually exclusive categories, large sample size. No normality/homoscedasticity assumption for predictors.

    • Classification Accuracy:

      • Sensitivity: Hit rate for the target category.

      • Specificity: Correct rejection rate for the reference category.

  • ROC Analysis (Receiver Operating Characteristic): A curve plotting Sensitivity against $1 - \text{Specificity}$ for every possible cutoff point. The "Area Under the Curve" (AUC) indicates overall model power; researchers use the curve to find the optimal cutoff for their specific needs (balancing false positives vs. false negatives).