ANOVA (Analysis of Variance): Used for comparing three or more groups.
T-test is suitable for comparing only two groups.
Key assumptions:
Normal distribution of data.
Equal variances across groups.
Levene’s Test: To test for homogeneity of variance.
If significant, indicates variances differ across groups.
Between Group Variance: Spread of group scores around the grand mean, indicating the separation between groups.
Within Group Variance: Spread of scores within each group around their respective means.
Definition: Ratio of between-group variance to within-group variance.
F Ratio = (Explained Variance) / (Unexplained Variance).
Significance of F Ratio:
Larger F ratio suggests ANOVA is more likely to be significant.
Indicates a difference exists among group means.
Interpretation:
Large F ratio: Small p-value → reject null hypothesis (H0).
Small F ratio: Large p-value → fail to reject H0.
Power: Probability of correctly rejecting a false null hypothesis (H0).
High Power: High probability of detecting a true difference.
Low Power: Increased risk of failing to detect significant differences (Type II error).
Effect Size: Quantifies how much group means differ.
Small Effect Size: Indicates a small difference.
Large Effect Size: Indicates a significant difference among group means.
Involves one independent variable with three or more levels.
Components:
Comparison of between-group variance (explained) vs. within-group variance (unexplained).
Sum of Squares (SS): Reflects variance. Larger SS indicates higher variance.
Degrees of Freedom (df): Always one less than total observations: df = k - 1 (where k = total number of groups).
Involves two independent variables, each with two or more levels.
Analyzes:
Main effects of each independent variable.
Interaction effects between the variables.
No interaction is indicated by parallel lines; crossing or non-parallel lines indicate interaction.
Each subject is tested under all experimental conditions (similar to a paired t-test).
Controls for differences between subjects.
Focus is on variance within subjects.
Involves one independent variable and one repeated factor.
Example: Symptom treatment over time, with and without exercise.
ANOVA indicates at least one significant difference but does not specify which group means are different.
Purpose of Multiple Comparison Tests: Contrast pairs of means against critical value to identify significant differences.
Liberal Tests: More likely to find significant differences with means that are closer together. Higher power, increased chance of Type I error (e.g., Fisher’s LSD).
Conservative Tests: Require means to be further apart for significant differences. Lower power, fewer Type I errors (e.g., Scheffe’s comparison).
Balance Test: Tukey test (more conservative than SNK).
Post Hoc: All pairwise contrasts explored after significant ANOVA (unplanned).
Planned Comparisons: Set beforehand to examine specific pairs of means, even when ANOVA isn't significant.
Simple Effects: Separate analyses of each row or column within a factorial design.
Examines the effect of one independent variable at specific levels of another.
Suitable for nominal or ordinal data, generally have lower power than parametric tests.
Key Features:
Do not assume normality of variance.
Often used with small samples.
Parametric vs Nonparametric Tests:
Unpaired T-test: Mann-Whitney U test (two independent groups).
Paired T-test: Sign test or Wilcoxon signed-ranks test (two related groups).
One Way ANOVA: Kruskal-Wallis ANOVA (three or more independent groups).
One Way Repeated Measures ANOVA: Friedman two way ANOVA (three or more related groups).
Useful when data is not normally distributed, has outliers, or fails homogeneity of variance assumptions.
Ranking Scores: Ranks data from smallest to largest (negative values considered smallest). Ties are assigned average ranks.
Purpose: Tests significance of proportions and determines if differences between observed and expected data are due to chance or relationships.
Assesses associations between categorical variables but does not imply causation.
Assumptions:
Data represent individual counts.
Categories are mutually exclusive.
No subject represented twice.
Hypothesis Testing:
If Chi-square > critical value → reject H0.
If Chi-square < critical value → fail to reject H0.
Evaluates if observed data aligns with a specific distribution or expected proportions.
Hypothesis Testing: H0 states observed proportion does not differ from expected proportion.
Degrees of Freedom: df = k - 1 (number of categories).
Identify which categories contribute most to the Chi-square value.
Formula: Residual = Observed - Expected, (Standardized Residual = Residual / √Expected).
Larger residuals = greater variation from expected
Examines association between two categorical variables.
Hypothesis Testing: H0 assumes no association.
Data is organized into contingency tables.
Degrees of Freedom: df = (rows - 1) × (columns - 1).
Measures association between two variables.
Types of Relationships:
Positive: As X increases, Y increases.
Negative: As X increases, Y decreases.
Note: Does not imply causation.
Visual representation of data to clarify patterns.
Closer points to a straight line indicate stronger associations.
Outliers: Data points that lie outside the cluster.
Used to assess strength and direction of relationship between two variables.
Range: From -1.0 to +1.0.
The sign indicates direction, while 0 indicates no relationship.
Pearson r: Used for sample data and population parameters (p).
Applicable when X and Y are continuous variables, normally distributed, on interval/ratio scales.
Hypothesis Testing:
H0 states no relationship (p = 0).
HA states a relationship exists (p ≠ 0).
Spearman Rank Correlation Coefficient (rs): Nonparametric alternative to Pearson's coefficient, based on ranked data or ordinal data.
The correlation coefficient describes strength but not prediction.
Regression models predict outcomes based on shared variance.
Coefficient of Determination (r²): Represents proportion of variance explained by the independent variable. Ranges from 0 to 1.
Predicts how well one variable predicts another.
Variables:
X: Independent variable.
Y: Dependent variable.
Regression Equation: Ŷ = a + bX.
H0: b = 0 (no relationship).
Least Squares Method: Fits regression line to minimize sum of squared residuals.
Utilizes multiple independent variables to predict one dependent variable.
Equation: Ŷ = a + b1X1 + b2X2 + …
R² indicates percentage of total variance explained by predictors.
H0: b = 0 for each independent variable.
Allows comparison across different units of independent variables.
Coefficients converted to z-scores or standardized beta weights to measure contribution to prediction.
Occurs when independent variables are correlated, making some appear less important.
Higher Variance Inflation Factor (VIF) indicates greater collinearity.
Predicts probability of event occurrence with a dichotomous dependent variable.
Independent variables can be continuous or categorical.
Dummy Variables: Used for categorical variables by assigning numerical values.
Outcome Coding: Target group is coded as 1, reference group as 0.
Measures likelihood of belonging to a target group compared to a reference group.
OR > 1: Increased risk.
OR < 1: Decreased risk.