1/26
Vocabulary flashcards for reviewing nonparametric procedures in STA 333.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Null Hypothesis
A hypothesis about an unobserved population that is actually tested.
Test Statistic
A statistic calculated from sample data that is useful for testing the null hypothesis.
Null Distribution
The distribution of the test statistic when the null hypothesis is true.
Permutation Tests
Tests that compare an observed sample result to a distribution of expected results under some null hypothesis.
Bootstrapping
A way to simulate the collection of new samples of the same size from the population being studied.
Bootstrap Sample
Any sample drawn with replacement from a collected sample such that the bootstrap sample is of the same size as the original sample, and the resampling is done in a way that respects the structure of the original sample.
Percentile-based Bootstrap CI
A bootstrap confidence interval obtained by finding bracketing quantiles in the bootstrap distribution to determine lower and upper bounds.
Recursive Partitioning
A process used to obtain tree-based models by splitting a sample based on predictor values.
Regression Trees
Tree-based models that handle numeric responses.
Classification Tree
Tree-based models that handle categorical responses.
Cross-Validation (CV)
Methods that hold out parts of the data solely for the purpose of obtaining independent predictions of new observations to obtain better trees for predictive purposes.
k-fold CV
A cross-validation Technique that is most appropriate for optimizing predictive accuracy.
Cost-Complexity Value
A metric used to determine the appropriate level of complexity when pruning a tree.
Bagging Techniques
Bootstrap aggregating. Building multiple different decision tree models from a single training data set by repeatedly using multiple bootstrapped subsets of the data and averaging over the models.
Random Forest
A special type of bagging applied to decision trees and is an example of a strong learner.
Binomial test for a pop median
𝐻0 being tested 𝑀 = 𝑀0. 𝑇 = number of sample observations exceeding 𝑀0
Sign test
Test to compare two pop medians 𝑀1and 𝑀2. 𝑀1 − 𝑀2 = 0 𝑇 = number of sample pairs with positive difference score
McNemar’s test
Test for paired proportions 𝑝1 and 𝑝2. 𝑝1 − 𝑝2 = 0 𝑇 = number of individuals whose response changes
Wilcoxon Signed-Ranks Test
Test to compare two pop medians 𝑀1and 𝑀2. 𝑀1 − 𝑀2 = 0 𝑉 = sum of the ranks associated with the positive difference scores
Wilcoxon Rank-Sum test
Test to compare locations of two distributions. Two pops have the “same location” 𝑊 = sum of the joint sample ranks for observations from the first sample
RMD test
Test to compare two pop variances. Two pops have the “same spread”. 𝑅𝑀𝐷 = ratio of mean deviances
Ansari-Bradley test
Test to compare two pop variances. Two pops have the “same spread”. Ranks obtained by starting ranking from extremes on both ends
Permutation F-test
Test to compare locations of k ≥ 2 distributions. k pops have the same distribution. Same as the usual F-statistic for a parametric one-way ANOVA
Kruskal-Wallis test
Test to compare locations of k ≥ 2 distributions. k pops have the “same location”. Same as permutation F-statistic above, but using joint ranks instead of actual sample values
Spearman rank correlation test
Test for assessing association between ordinal variables. True pop rank correlation 𝜌 = 0 𝑟𝑆, the Spearman rank correlation coefficient (same as Pearson correlation calculated on the ranks)
Kendall correlation test
Test for assessing association between quantitative variables. True pop concordance correlation 𝜌 = 0 𝜏𝑏, the Kendall correlation coefficient
Fisher’s exact test
Test for a 2 x 2 contingency table. Two binary nominal variables are unrelated.