1/58
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Mann-Whitney U Test
Nonparametric test used for two independent samples; equivalent to an unpaired t-test.
Wilcoxon Signed-Rank Test
Nonparametric test used for two dependent samples (paired data); equivalent to a paired t-test.
Nonparametric tests underlying procedure
Rank the data, and use these ranks to perform the test, rather than the raw data itself.
Hypotheses for Wilcoxon signed-rank test
H0: the median difference is 0; H1: the median difference is not 0.
Corrected R code to install nortest package
install.packages("nortest")
Purpose of lillie.test(Differences) in R
Test of normality on 'Differences'.
Bootstrapping Method (Confidence Intervals)
Used when the sample size is small (n < 30) and the underlying distribution is not normal.
Default Resamples in Bootstrap Confidence Intervals
1000
R code for 99% BCa and Percentile Bootstrap Confidence Intervals
boot.ci(name of bootstrap output, conf=0.99, type=c(“bca”,“perc”)
95% Bootstrap Confidence Interval (using normal approximation)
x̄ ± 1.96 * standard error
99% Bootstrap Confidence Interval (using normal approximation)
x̄ ± 2.58 * standard error
Mann-Whitney U Test
A test used to compare two independent groups when the sample size is small and the data is not normally distributed.
Null Hypothesis (H0) in Mann-Whitney U Test
The medians of the two groups are equal.
Alternative Hypothesis (H1) in Mann-Whitney U Test
The medians of the two groups are not equal.
U Statistic Calculation
U1 = R1 − n1(n1 + 1) / 2 and U2 = R2 − n2(n2 + 1) / 2 , where U = Min{U1, U2}
Decision Rule for Mann-Whitney U Test (Reject H0)
Reject H0 if U < Ucrit (critical value)
When do we use the bootstrapping method to create confidence intervals
Bootstrap confidence intervals are used when the sample is small(n < 30) and the underlying distribution is not normal.
Non parametric equivalent of one-way repeated ANOVA
Friedman Test
Non parametric equivalent of one-way ANOVA
Kruskal-Wallis H Test
Non parametric equivalent of paired t-test
Wilcoxon signed rank test
Non parametric equivalent of unpaired t-test
Mann-Whitney U test
What conditions need to be broken to use Mann-Whitney U instead of unpaired t-testt
-normality of samples
-homogeneity of variances
When to use Mann-Whitney U test
-Ordinal or Continuous data
-2 independent groups
-test of difference
What assumptions need to be broken to use Wilcoxon signed rank instead of paired t-test
-assumption of normality of the differences of paired data is not met
- if we have ordinal data
what assumptions need to be broken to use Kruskal-wallis H instead of one-way ANOVA
-normality of residuals
-homogeneity of variances
What assumptions need to be broken to use Friedman test instead of one-way repeated ANOVA.
-normality of residuals
- sphericity
When to use Kruskal-Wallis test?
-Extension of Mann-Whitney U
-3 or more independent groups
when to use wilcoxon signed rank test
-ordinal or continuous data
-Distribution of the differences must be symmetric
-2 dependent groups
-test of difference
when to use Friedman test
-Extension of Wilcoxon Signed rank
-3 or more dependent groups
How to use the Wilcoxon signed rank test
-Work out differences
-put differences in order ignoring 0 and signs
-rank differences sorting out tied ranks
-put signs in a column
-add together all the T- and T+
-Minimum of (T-,T+) is test statistic
-compare to T crit value from tables (n is no. values without 0 differences and 0.05 is halved to be alpha)
-T
How to use the Mann-Whitney U test
-Put data in order
-Work out ranks sorting out tied ranks
-sum up the ranks for each group
-then calculate U1 and U2 (given fromulae)
-Minimum of (U1,U2) is test statistic
-Compare to U crit value from tables with half sig level.
-U
How to use Kruskal Wallis H test?
-Put data in order
-Work out ranks sorting out tied ranks
-Sum up the ranks for each group
-Work out n for each group
-Work out H (formula given)
-Work out H corrected if needed by H/CH
-Use χ2 distribution tables with k-1 degrees of freedom
-If H/Hcorrected ≥ χ2crit we reject H0
How to use Friedman test?
-Count number of groups (not including the participants)
-Rank each participant across their own groups. (e.g if there is 3 groups then 1 -3 across each individuals scores across their groups) (Ranking across not down like in other tests)
-Account for tied ranks
-Sum each column
-Calculate F (formula given)
-For k=3 and n=2,3,...,9 or k=4 and n=2,3,4 we evaluate the test statistic by comparing it with values from Friedman tables.
- If F/Fcorrected ≥ Fcrit we reject H0
-Otherwise, we use tables of the χ2 distribution with k − 1 degrees of freedom.
- If F/Fcorrected ≥ χ2crit we reject H0
How to calculate E(X)?
Add together degrees of freedom or just the bottom number.
How to calculate Var(X)
Multiply little numbers next to χ together
How to calculate SD(X)
square root var(X)
When to use chi-square goodness of fit test
-no expected frequencies should be less than 5
-Each participant can only contribute to one category or cell in the frequency table, therefore we have independence
How to use chi-square goodness of fit test
-Work out each observed - expected
-Square the observed-expected
-Divided these squared values by the plain expected values
-The test statistic is the divided values added together
-Compare to value in chi squared tables using critical value as is and k-1 df (k is number of categories).
-Table value > test statistic means Accept H0.
When to use χ2-Test Contingency Tables
-test of relationship/association
-no expected frequency should be less than 5
-Each participant can only contribute to one category or cell in the frequency table, therefore we have independence
How to use χ2-Test Contingency Tables
-total up every row and column
-Multiply each total together and divide by the whole total to get every position in the table (not the same as the actual values in the table)
-Then use the formula given to find the test statistic (this formula y is the value in the table and y~ is the value you just worked out)
-df is (r − 1)(c − 1)
-compare to the χ2 table value with the critical value as is
-Table value>test statistic means accept H0.
When to use Yates continuity correction
-when the degrees of freedom is 1
How to use Yates continuity correction
-change the formula used in χ2-Test Contingency Tables to (|y −y ̃|−0.5)^2 /y ̃
(-0.5 is the difference)
When to use Phi
-to test Effect Size/Strength of Association
-for 2 × 2 contingency tables
When to use Cramer's V
-to test Effect Size/Strength of Association
-two categorical variables when each variable has two or more categories
When to use Odds Ratio
-to find the ratio of the odds that an outcome will occur in group 1 and the odds of the outcome occurring in group 2
-2 × 2 contingency tables
How to use Phi
φ=sq( χ2/n)
if φ=0.1 SMALL
φ=0.3 MEDIUM
φ=0.5 LARGE
How to use Cramer's V
V=sq( χ2/ n x dvf)
where dfv = min(c−1,r−1)
How to use odds ratio
in relation to the values in the table
OR= ((1,1)/(1,2)) / ((2,1)/(2,2))
We evaluate odds ratios in the following way
• OR = 1: belonging to group 1 has not affected the odds of outcome A;
• OR > 1: belonging to group 1 has increased the odds of outcome A;
• OR < 1: belonging to group 1 has decreased the odds of outcome A
When to use the poisson distribution
-the Poisson distribution is typically used in situations where we count events
-also used for claim frequency
(Predicting the number....)
When to use the Logistic regression
-To predict Binary outcome from a linear combination of independent variables
-also used in insurance to calculate the prosperity to claim
-predict the probability of......
What is the canonical link function of logistic regression
logit which is log(odds)
when to use gamma regression
-used to predict a gamma distributed outcome from a linear combination of independent variables
-eg predicting the size of an insurance claim based on the age of driver.
what is the canonical link function of gamma regression
reciprocal function which is 1/λ
what is the canonical link function of the poisson
log link function. which is log(µ)
what is the canonical link function of the normal distribution
identify link function
When to reject H0
-For MWU and WSR table>test means reject
-For all other tests table
What is the key difference between the binomial distribution and the hypergeometric distribution?
Binomial is sampling with replacement.
Hypergeometric is sampling without replacement
Under what assumption is the null hypothesis for these tests that all group medians are equal? Why is this preferred and how would we check this?
The assumption is that the distributions of the groups are of the same shape. This is preferred as it is more in keeping as an alternative method to ANOVA (comparing means) and it provides more useful interpretations. We could check this using histograms and/or boxplots
pca or fa
PCA is used to simply reduce the observed variables into a smaller set of components.
FA is used when there are suspected latent variables/factors.