Biostats Final Exam Review

Module 6: ANOVA

General Overview

  • ANOVA (Analysis of Variance): Used for comparing three or more groups.

  • T-test is suitable for comparing only two groups.

  • Key assumptions:

    • Normal distribution of data.

    • Equal variances across groups.

  • Levene’s Test: To test for homogeneity of variance.

  • If significant, indicates variances differ across groups.

Variance Components

  • Between Group Variance: Spread of group scores around the grand mean, indicating the separation between groups.

  • Within Group Variance: Spread of scores within each group around their respective means.

F Ratio

  • Definition: Ratio of between-group variance to within-group variance.

  • F Ratio = (Explained Variance) / (Unexplained Variance).

  • Significance of F Ratio:

    • Larger F ratio suggests ANOVA is more likely to be significant.

    • Indicates a difference exists among group means.

  • Interpretation:

    • Large F ratio: Small p-value → reject null hypothesis (H0).

    • Small F ratio: Large p-value → fail to reject H0.

Power & Effect Size

  • Power: Probability of correctly rejecting a false null hypothesis (H0).

    • High Power: High probability of detecting a true difference.

    • Low Power: Increased risk of failing to detect significant differences (Type II error).

  • Effect Size: Quantifies how much group means differ.

    • Small Effect Size: Indicates a small difference.

    • Large Effect Size: Indicates a significant difference among group means.

One Way ANOVA

  • Involves one independent variable with three or more levels.

  • Components:

    • Comparison of between-group variance (explained) vs. within-group variance (unexplained).

    • Sum of Squares (SS): Reflects variance. Larger SS indicates higher variance.

    • Degrees of Freedom (df): Always one less than total observations: df = k - 1 (where k = total number of groups).

Two Way ANOVA

  • Involves two independent variables, each with two or more levels.

  • Analyzes:

    • Main effects of each independent variable.

    • Interaction effects between the variables.

  • No interaction is indicated by parallel lines; crossing or non-parallel lines indicate interaction.

Repeated Measures ANOVA

  • Each subject is tested under all experimental conditions (similar to a paired t-test).

  • Controls for differences between subjects.

  • Focus is on variance within subjects.

Mixed Design ANOVA

  • Involves one independent variable and one repeated factor.

  • Example: Symptom treatment over time, with and without exercise.

Multiple Comparison Tests

  • ANOVA indicates at least one significant difference but does not specify which group means are different.

  • Purpose of Multiple Comparison Tests: Contrast pairs of means against critical value to identify significant differences.

Liberal vs Conservative Tests

  • Liberal Tests: More likely to find significant differences with means that are closer together. Higher power, increased chance of Type I error (e.g., Fisher’s LSD).

  • Conservative Tests: Require means to be further apart for significant differences. Lower power, fewer Type I errors (e.g., Scheffe’s comparison).

  • Balance Test: Tukey test (more conservative than SNK).

Post Hoc Multiple Comparisons

  • Post Hoc: All pairwise contrasts explored after significant ANOVA (unplanned).

  • Planned Comparisons: Set beforehand to examine specific pairs of means, even when ANOVA isn't significant.

ANOVA with Multifactorial Variables

  • Simple Effects: Separate analyses of each row or column within a factorial design.

  • Examines the effect of one independent variable at specific levels of another.

Nonparametric Statistics

  • Suitable for nominal or ordinal data, generally have lower power than parametric tests.

  • Key Features:

    • Do not assume normality of variance.

    • Often used with small samples.

  • Parametric vs Nonparametric Tests:

    • Unpaired T-test: Mann-Whitney U test (two independent groups).

    • Paired T-test: Sign test or Wilcoxon signed-ranks test (two related groups).

    • One Way ANOVA: Kruskal-Wallis ANOVA (three or more independent groups).

    • One Way Repeated Measures ANOVA: Friedman two way ANOVA (three or more related groups).

  • Useful when data is not normally distributed, has outliers, or fails homogeneity of variance assumptions.

  • Ranking Scores: Ranks data from smallest to largest (negative values considered smallest). Ties are assigned average ranks.

Module 7: Chi Square

Chi Square Basics

  • Purpose: Tests significance of proportions and determines if differences between observed and expected data are due to chance or relationships.

  • Assesses associations between categorical variables but does not imply causation.

  • Assumptions:

    • Data represent individual counts.

    • Categories are mutually exclusive.

    • No subject represented twice.

  • Hypothesis Testing:

    • If Chi-square > critical value → reject H0.

    • If Chi-square < critical value → fail to reject H0.

Goodness of Fit

  • Evaluates if observed data aligns with a specific distribution or expected proportions.

  • Hypothesis Testing: H0 states observed proportion does not differ from expected proportion.

  • Degrees of Freedom: df = k - 1 (number of categories).

Standardized Residuals

  • Identify which categories contribute most to the Chi-square value.

  • Formula: Residual = Observed - Expected, (Standardized Residual = Residual / √Expected).

  • Larger residuals = greater variation from expected

Independence Test

  • Examines association between two categorical variables.

  • Hypothesis Testing: H0 assumes no association.

  • Data is organized into contingency tables.

  • Degrees of Freedom: df = (rows - 1) × (columns - 1).

Correlation

  • Measures association between two variables.

  • Types of Relationships:

    • Positive: As X increases, Y increases.

    • Negative: As X increases, Y decreases.

  • Note: Does not imply causation.

Scatter Plots

  • Visual representation of data to clarify patterns.

  • Closer points to a straight line indicate stronger associations.

  • Outliers: Data points that lie outside the cluster.

Correlation Coefficient

  • Used to assess strength and direction of relationship between two variables.

  • Range: From -1.0 to +1.0.

  • The sign indicates direction, while 0 indicates no relationship.

Pearson Coefficient

  • Pearson r: Used for sample data and population parameters (p).

  • Applicable when X and Y are continuous variables, normally distributed, on interval/ratio scales.

  • Hypothesis Testing:

    • H0 states no relationship (p = 0).

    • HA states a relationship exists (p ≠ 0).

  • Spearman Rank Correlation Coefficient (rs): Nonparametric alternative to Pearson's coefficient, based on ranked data or ordinal data.

Module 8: Regression

Linear Regression

  • The correlation coefficient describes strength but not prediction.

  • Regression models predict outcomes based on shared variance.

  • Coefficient of Determination (r²): Represents proportion of variance explained by the independent variable. Ranges from 0 to 1.

Simple Linear Regression

  • Predicts how well one variable predicts another.

  • Variables:

    • X: Independent variable.

    • Y: Dependent variable.

  • Regression Equation: Ŷ = a + bX.

  • H0: b = 0 (no relationship).

  • Least Squares Method: Fits regression line to minimize sum of squared residuals.

Multiple Regression

  • Utilizes multiple independent variables to predict one dependent variable.

  • Equation: Ŷ = a + b1X1 + b2X2 + …

  • R² indicates percentage of total variance explained by predictors.

  • H0: b = 0 for each independent variable.

Standardized Regression Coefficients

  • Allows comparison across different units of independent variables.

  • Coefficients converted to z-scores or standardized beta weights to measure contribution to prediction.

Collinearity

  • Occurs when independent variables are correlated, making some appear less important.

  • Higher Variance Inflation Factor (VIF) indicates greater collinearity.

Logistic Regression

  • Predicts probability of event occurrence with a dichotomous dependent variable.

  • Independent variables can be continuous or categorical.

  • Dummy Variables: Used for categorical variables by assigning numerical values.

  • Outcome Coding: Target group is coded as 1, reference group as 0.

Odds Ratio

  • Measures likelihood of belonging to a target group compared to a reference group.

  • OR > 1: Increased risk.

  • OR < 1: Decreased risk.

robot