Biostats Final Exam Review

Module 6: ANOVA

General Overview

ANOVA (Analysis of Variance): Used for comparing three or more groups.
T-test is suitable for comparing only two groups.
Key assumptions:
- Normal distribution of data.
- Equal variances across groups.

Levene’s Test: To test for homogeneity of variance.
If significant, indicates variances differ across groups.

Variance Components

Between Group Variance: Spread of group scores around the grand mean, indicating the separation between groups.
Within Group Variance: Spread of scores within each group around their respective means.

F Ratio

Definition: Ratio of between-group variance to within-group variance.
F Ratio = (Explained Variance) / (Unexplained Variance).
Significance of F Ratio:
- Larger F ratio suggests ANOVA is more likely to be significant.
- Indicates a difference exists among group means.
Interpretation:
- Large F ratio: Small p-value → reject null hypothesis (H0).
- Small F ratio: Large p-value → fail to reject H0.

Power & Effect Size

Power: Probability of correctly rejecting a false null hypothesis (H0).
- High Power: High probability of detecting a true difference.
- Low Power: Increased risk of failing to detect significant differences (Type II error).
Effect Size: Quantifies how much group means differ.
- Small Effect Size: Indicates a small difference.
- Large Effect Size: Indicates a significant difference among group means.

One Way ANOVA

Involves one independent variable with three or more levels.
Components:
- Comparison of between-group variance (explained) vs. within-group variance (unexplained).
- Sum of Squares (SS): Reflects variance. Larger SS indicates higher variance.
- Degrees of Freedom (df): Always one less than total observations: df = k - 1 (where k = total number of groups).

Two Way ANOVA

Involves two independent variables, each with two or more levels.
Analyzes:
- Main effects of each independent variable.
- Interaction effects between the variables.
No interaction is indicated by parallel lines; crossing or non-parallel lines indicate interaction.

Repeated Measures ANOVA

Each subject is tested under all experimental conditions (similar to a paired t-test).
Controls for differences between subjects.
Focus is on variance within subjects.

Mixed Design ANOVA

Involves one independent variable and one repeated factor.
Example: Symptom treatment over time, with and without exercise.

Multiple Comparison Tests

ANOVA indicates at least one significant difference but does not specify which group means are different.
Purpose of Multiple Comparison Tests: Contrast pairs of means against critical value to identify significant differences.

Liberal vs Conservative Tests

Liberal Tests: More likely to find significant differences with means that are closer together. Higher power, increased chance of Type I error (e.g., Fisher’s LSD).
Conservative Tests: Require means to be further apart for significant differences. Lower power, fewer Type I errors (e.g., Scheffe’s comparison).
Balance Test: Tukey test (more conservative than SNK).

Post Hoc Multiple Comparisons

Post Hoc: All pairwise contrasts explored after significant ANOVA (unplanned).
Planned Comparisons: Set beforehand to examine specific pairs of means, even when ANOVA isn't significant.

ANOVA with Multifactorial Variables

Simple Effects: Separate analyses of each row or column within a factorial design.
Examines the effect of one independent variable at specific levels of another.

Nonparametric Statistics

Suitable for nominal or ordinal data, generally have lower power than parametric tests.
Key Features:
- Do not assume normality of variance.
- Often used with small samples.
Parametric vs Nonparametric Tests:
- Unpaired T-test: Mann-Whitney U test (two independent groups).
- Paired T-test: Sign test or Wilcoxon signed-ranks test (two related groups).
- One Way ANOVA: Kruskal-Wallis ANOVA (three or more independent groups).
- One Way Repeated Measures ANOVA: Friedman two way ANOVA (three or more related groups).
Useful when data is not normally distributed, has outliers, or fails homogeneity of variance assumptions.
Ranking Scores: Ranks data from smallest to largest (negative values considered smallest). Ties are assigned average ranks.

Module 7: Chi Square

Chi Square Basics

Purpose: Tests significance of proportions and determines if differences between observed and expected data are due to chance or relationships.
Assesses associations between categorical variables but does not imply causation.
Assumptions:
- Data represent individual counts.
- Categories are mutually exclusive.
- No subject represented twice.
Hypothesis Testing:
- If Chi-square > critical value → reject H0.
- If Chi-square < critical value → fail to reject H0.

Goodness of Fit

Evaluates if observed data aligns with a specific distribution or expected proportions.
Hypothesis Testing: H0 states observed proportion does not differ from expected proportion.
Degrees of Freedom: df = k - 1 (number of categories).

Standardized Residuals

Identify which categories contribute most to the Chi-square value.
Formula: Residual = Observed - Expected, (Standardized Residual = Residual / √Expected).
Larger residuals = greater variation from expected

Independence Test

Examines association between two categorical variables.
Hypothesis Testing: H0 assumes no association.
Data is organized into contingency tables.
Degrees of Freedom: df = (rows - 1) × (columns - 1).

Correlation

Measures association between two variables.
Types of Relationships:
- Positive: As X increases, Y increases.
- Negative: As X increases, Y decreases.
Note: Does not imply causation.

Scatter Plots

Visual representation of data to clarify patterns.
Closer points to a straight line indicate stronger associations.
Outliers: Data points that lie outside the cluster.

Correlation Coefficient

Used to assess strength and direction of relationship between two variables.
Range: From -1.0 to +1.0.
The sign indicates direction, while 0 indicates no relationship.

Pearson Coefficient

Pearson r: Used for sample data and population parameters (p).
Applicable when X and Y are continuous variables, normally distributed, on interval/ratio scales.
Hypothesis Testing:
- H0 states no relationship (p = 0).
- HA states a relationship exists (p ≠ 0).
Spearman Rank Correlation Coefficient (rs): Nonparametric alternative to Pearson's coefficient, based on ranked data or ordinal data.

Module 8: Regression

Linear Regression

The correlation coefficient describes strength but not prediction.
Regression models predict outcomes based on shared variance.
Coefficient of Determination (r²): Represents proportion of variance explained by the independent variable. Ranges from 0 to 1.

Simple Linear Regression

Predicts how well one variable predicts another.
Variables:
- X: Independent variable.
- Y: Dependent variable.
Regression Equation: Ŷ = a + bX.
H0: b = 0 (no relationship).
Least Squares Method: Fits regression line to minimize sum of squared residuals.

Multiple Regression

Utilizes multiple independent variables to predict one dependent variable.
Equation: Ŷ = a + b1X1 + b2X2 + …
R² indicates percentage of total variance explained by predictors.
H0: b = 0 for each independent variable.

Standardized Regression Coefficients

Allows comparison across different units of independent variables.
Coefficients converted to z-scores or standardized beta weights to measure contribution to prediction.

Collinearity

Occurs when independent variables are correlated, making some appear less important.
Higher Variance Inflation Factor (VIF) indicates greater collinearity.

Logistic Regression

Predicts probability of event occurrence with a dichotomous dependent variable.
Independent variables can be continuous or categorical.
Dummy Variables: Used for categorical variables by assigning numerical values.
Outcome Coding: Target group is coded as 1, reference group as 0.

Odds Ratio

Measures likelihood of belonging to a target group compared to a reference group.
OR > 1: Increased risk.
OR < 1: Decreased risk.