1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Negative Binomial Regression
The response variable is a count of burbot captured in gill net samples. Most of the counts are 0, 1, and 2, but a few nets yield as many as 50. The predictor variables are water temperature (x1) and substrate type (sand, cobble, mud) (x2).
Kruskall Wallis
You set out to compare the mean number of mayflies in kick net samples in three different habitats (riffle, pool, beaver pond). However, the assumption of normality is not met, and with smaller sample sizes, you would like to compare the abundance of mayflies using a conservative, nonparametric test based on ranking your data.
STD
You want to describe the variation in the diameters of zebrafish eggs in one sample of 250 eggs. You are just interested in the variation in the sample of 250 eggs—nothing more—and would like to know the range in diameters where 68% of the egg diameters fall
visreg
A graphics package used in multiple regression that can graph a focal predictor variable, while controlling for other Xs in the model. Something ggplot cannot do,
Welchs T Test
We want to compare the mean length of a lake trout vs hybrid splake. The samples of fish from each group are random and independent. However, the splake group has way more variance in length than lake trout (P=0.0001). Which "spinoff version" of this common test is best for the variance issue.
Pearson's r
You have two continuous variables —density of hemlock wooly adelgids and density of emerald ash borers—two nasty exotic insects killing trees in Michigan. You just want to see if there is an association between the abundance of the two exotics, and how strong the association is. You do NOT want to plot a trendline or create a functional model of the relationship because there is no clear X or Y.
Cooks D
You have just created a regression line, and you have one apparent outlier point that could be strongly affecting the slope of your line. To see if it is truly influential, what metric would you use?
Shapiro-Wilke Test of Normality
You are examining the assumptions before running an independent sample t test or one-way ANOVA. First you graph a histogram of your measurements (response) to see if your data looks normally distributed. Then, you follow this up with a formal test to see whether your data differs significantly from normal. What is this test? (answer)
One-Way ANOVA
You want to compare mean THC levels in 4 varieties of Cannabis. Y is THC, X is variety. All assumptions of normality, equal variance, and independence are met.
Tukey's HSD
In the previous question you find that mean THC levels were significantly different among varieties. Now you want to know which specific varieties are different from each other.
Dunnett's Test
Suppose one of the varieties above is a "wild" type and you just wanted to compare 3 modern cultivars versus the wild type.
Linearized log regression
You want to create a regression between weight of deer (y) and their age (x), two continuous variables. But the preliminary run produces this trend
Linearized polynomial regression
You want to graph the relationship between egg production (y) and age (x) in sturgeon, but the trend looks sort of like a hump. Eggs is a count, but each fish typically produces hundreds or more
Poisson
You are managing deer, and one goal is to provide quality bucks to hunters. Quality is measured as the number of points on the antlers. Most of the deer in herd are young 1-3 years old and have only 0-3 points. A few of the larger deer have up to 10 or 12 points. You want model the relationship between point count (y) and deer age (x1) and habitat (x2). The residual deviance: df ratio is about 1.3
Quasipoisson regression
Same as previous, but some of the older deer in this region have up to 18 points, and the residual deviance: df ratio is 1.7
SE
You have a standard deviation that describes the variation of the measurements in your data, but you want to make inferences about the variation in means in the population
95% CI
With the same scenario as above, what if you want to make inferences about variation in means in the population and also estimate the range (high and low) that you expect to find the true mean with a probability of 0.95.
Smirnoff
You are interested in whether the age distributions of elephants in South Africa differ from those in Zambia. You plot out the age frequency histograms from both countries. The mean and median ages seem about the same, but the two distributions have different shapes. What test could you use to show that the age distributions of the two elephant populations are different, even though their means and median ages are not.
Anderson-Darling Test
In the previous Q, what if the differences in age distributions occurs mainly due to differences in the number of calves and the oldest animals. What alternative to the previous is more sensitive to differences at the "extremes" in the age-frequency histogram?
Spearman's rank correlation coefficient
You want a simple statistic/index that tells you the degree of association between aggressiveness in mice and size. Size is measured in grams. Aggressiveness is an index between 0-5, which is loosely based on the number of aggressive encounters per minute. What constitutes an aggressive encounter is not always obvious, so the researchers used the index instead.
Kendall's Tau
In the previous question, supposed the aggressiveness is based on a very subjective score of the intensity of the aggressive behavior. What more conservative version of the previous question would be used?
QQ plot
In complicated ANOVAs and also in regression, normality assumptions using classic exploration of the distribution of your data do not work very well. What tool is used to assess the global normality of residuals as a proxy for testing normality in various scenarios.
r2
In ANOVA and regression, this metric tells you the % of variation in Y that is explained by X(s).
AICwt
If you are tryinging to fit a regression model and have many predictors (Xs) as well as 1-2 interaction terms involving your Xs, what metric would help you decide which X terms you should keep in the final model to explain as much variation in Y with the fewest predictors?
Partial Corr
You are looking at the degree of association between toxoplasmosis (parasite encysted in brain) and murder rates (per 100 K) in different cities. Toxoplasmosis is a rate—positive antigen tests per 100 people. However, you realize that there may be confounding variables, like the size of the city, and economic prosperity. If you want to look at the association between toxoplasmosis and murder rate, but control for city population size and mean household income, what test would you use?
Multiple Linear Regression
You want to model the functional relationship between the bird-tower collisions (counts are usually very high and not poisson-like) in relation to various predictors: (a) tower height, (b) elevation, (c) distance from city, (d) number of foggy days per year, and (e) mean annual windspeed. All x-y scatters point towards good linearity.
Independent sample t
You want to compare the mean height of white pines versus red pines in a forest. Random samples of each are collected in a forest
GLM binomial logistic regression
A research wants to model seed germination in response to seed weight (x). Each seed is weighed and then planted. The response is either yes (1) it germinated or (2) not germinated.
Binomial Trials/proportion
Suppose the researcher puts 10-20 seeds in 100 petri dishes. 10 different doses of a plant hormone are applied to each dish, 10 dishes for each dose. At the end of the experiment the researcher records the number of germinated and the number of un-germinated seeds
ANCOVA
A researcher wants to compare mean weight gain (y) of chicken in response to 4 different feeding treatments. They also record the pre-treatment weight of each chicken, because they know that the intial weight of the chicken may affect how they respond to the treatment (e.g. scrawny chickens may not recover enough to respond to the treatments in the same way as others). How would they compare means for the 4 treatments, controlling for pre-treatment chicken weight?
Paired t-test
Researcher want to compare mean PCB in lakes in 1980 vs today (two groups). The same lakes are used in 1980 and 2023.
Linear Mixed Model (LMM)
You want to compare mean dace densities in pools vs riffles. However, you take multiple samples from 10 different streams, each stream with at least 5 samples of pools and riffles. Because the data from each stream is not independent, you can't ignore that the data was collected in different locations. How would you compare dace densities, controlling for the location of samples (stream_ID)
2 way ANOVA
The response variable is hemoglobin levels. The predictor variables are both factors: X1=disease_status (sickle cell homozygous, sickle cell heterozygous, normal) and X2= gender (M, F). The goal is compare mean hemoglobin levels in response to disease status and gender, and also to consider the interaction between disease status and gender.
emmeans
You want a follow-up test to compare mean hemoglobin levels between pairwise combinations of each disease type and gender.
Bootstrap/Permutation t
You want to compare means of two groups, but both groups are definitely skewed, and to make it worse, they are skewed in different directions. You have samples sizes of 40 for each. What is the best option to compare the means without having to resort to a weaker test based on ranks?
coefficient of variation
You have basic summary statistics on the bill size of several species of Darwin's Finches on the Galapagos. You want to determine which species is most variable with respect to bill size. You realize that SD won't work, because the larger finch species will always have higher SD because their bills are larger, so their means and SD will also be large. How can you compare variation in bill size among birds in a way that doesn't depend on their size?
Wilcoxon Rank Sum
You want to compare the mean eye diameter in two strains of fruit flies, but you only have 8 observations for each strain and you are unsure of the distribution of eye sizes. For a small sample size, what conservative test based on ranks would allow you to compare the two groups?
Welch's ANOVA
You are running a one-way ANOVA but discover that the Levene F test gives you a P=0.000001. What is the best alternative (simplest) to a regular ANOVA in this case?
Games Howell
In the previous question, which post hoc applies for comparison
Simple Linear Regession
You want to draw a trendline and model the relationship between Cholera levels in drinking water and population size. The relationship looks quite linear.