Between groups
Within groups
Main effects of each factor
Interaction effect between factors
Error variance
Represents the total variance in the data, encompassing all sources of variation (between and within groups).
Larger SS = higher variance
Larger difference between groups = more likely a significant difference will be found
Levene's Test to assess equality of variance among groups.
If the test is significant, variances are different across groups
df = number of groups - 1
F statistic: (mean between groups)² / (mean within groups)²
Larger F ratio = small p value = reject H0
Smaller F ratio = larger p value = fail to reject H0
What is the main effects of variable A?
What is the main effects of variable B?
What are the interactions between variables A and B?
Non-parallel lines indicate interaction; the effect of one factor differs at levels of the other.
To determine which specific group means are significantly different when ANOVA shows significant results.
Per Comparison Error Rate: Probability of a Type I error for each individual test.
Familywise Error Rate: Probability of a Type I error in a set of comparisons. This will increase as the number of comparisons increases
Post Hoc Comparisons: Conducted following a significant ANOVA
Planned Comparisons: Made based on hypotheses established prior to data collection, whether or not the ANOVA was significant
The smallest difference between group means that can be considered statistically significant.
Most Conservative Tests: Scheffé’s test.
Most Liberal Tests: Fisher’s Least Significant Difference.
A more liberal test will find more significant differences than a more conservative test for the same data.
A separate analysis of each row or column within a factorial design.
Score Ranking: List of scores to rank: 7, 9, 10, 10, 14, 15, 15, 15, 20, 25, 36, 41, 43, 43, 50.
Unpaired t-test: Mann-Whitney U Test.
Paired t-test: Wilcoxon Signed-Rank Test.
One-Way ANOVA: Kruskal-Wallis ANOVA by ranks
One-Way Repeated Measures ANOVA: Friedman two-way ANOVA by ranks
Non-normal data distribution, ordinal data, or variances not equal
Tests for Normal Distribution: Shapiro-Wilk test, Kolmogorov-Smirnov test.
Purpose: To evaluate if there are significant differences between observed and expected frequencies in categorical data.
Observations must be independent.
Expected frequencies must be sufficiently large (typically at least 5).
To see if observed categorical data fits a particular distribution.
Definition: The difference between observed and expected frequencies, standardized to evaluate significance.
Closer to 2.0 = more contribution to chi-squared
Definition: Examines if two categorical variables are independent.
Table Used: Contingency table.
df = (Rows - 1)(Columns - 1)
Points clustered around a straight line indicate a strong correlation, while widely dispersed points indicate a weak correlation.
Range: -1 to 1.
Interpretation of -.89: Strong negative correlation; as one variable increases, the other decreases.
The correlation coefficient, r
Spearman's rank correlation and the Kendall tau.
Purpose: To predict the value of a dependent variable based on one or more independent variables.
r² : the fraction of variance in the dependent variable explained by the independent variable(s).
Equation: Y = a + bX.
Where:
Y: Predicted value (dependent value)
a: Y-intercept (regression constant)
b: Slope (regression coefficient)
X: Independent variable value
It is the line of best fit to the data, with the smallest residuals or errors in predicting Y, represents the average relationship between independent and dependent variables.
Definition: Difference between observed values and predicted values.
Value Used: R² (coefficient of determination).
Purpose: Indicate the strength and direction of the effect of each predictor on the dependent variable.
Collinearity occurs if the independent predictor variables are correlated with each other, creating a situation where some variables may look less important to prediction, but only because they correlated with another variable, thereby being redundant in the explanation of variance.
Definition: A binary variable representing categorical data; typically coded as 0 or 1.
They both use independent (X) variables to explain Y, but in logistic regression Y is a dichotomous categorical variable, representing the presence or absence of condition or group membership. The result of regression is a predicted value. The result of logistic regression is a probability value related to the likelihood of an individual belonging to one of the outcome categories. Logistic regression also provides estimates of odds ratios for each of the independent variables.
The outcome is coded 0 for the reference group and 1 for the target group. The target group typically
represents the group with the adverse outcome.
The ratio of the odds of an event occurring in one group to the odds of it occurring in another group.