BI412 BIOSTATS FINAL

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/39

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

40 Terms

New cards

Negative Binomial Regression

The response variable is a count of burbot captured in gill net samples. Most of the counts are 0, 1, and 2, but a few nets yield as many as 50. The predictor variables are water temperature (x1) and substrate type (sand, cobble, mud) (x2).

New cards

Kruskall Wallis

You set out to compare the mean number of mayflies in kick net samples in three different habitats (riffle, pool, beaver pond). However, the assumption of normality is not met, and with smaller sample sizes, you would like to compare the abundance of mayflies using a conservative, nonparametric test based on ranking your data.

New cards

STD

You want to describe the variation in the diameters of zebrafish eggs in one sample of 250 eggs. You are just interested in the variation in the sample of 250 eggs—nothing more—and would like to know the range in diameters where 68% of the egg diameters fall

New cards

visreg

A graphics package used in multiple regression that can graph a focal predictor variable, while controlling for other Xs in the model. Something ggplot cannot do,

New cards

Welchs T Test

We want to compare the mean length of a lake trout vs hybrid splake. The samples of fish from each group are random and independent. However, the splake group has way more variance in length than lake trout (P=0.0001). Which "spinoff version" of this common test is best for the variance issue.

New cards

Pearson's r

You have two continuous variables —density of hemlock wooly adelgids and density of emerald ash borers—two nasty exotic insects killing trees in Michigan. You just want to see if there is an association between the abundance of the two exotics, and how strong the association is. You do NOT want to plot a trendline or create a functional model of the relationship because there is no clear X or Y.

New cards

Cooks D

You have just created a regression line, and you have one apparent outlier point that could be strongly affecting the slope of your line. To see if it is truly influential, what metric would you use?

New cards

Shapiro-Wilke Test of Normality

You are examining the assumptions before running an independent sample t test or one-way ANOVA. First you graph a histogram of your measurements (response) to see if your data looks normally distributed. Then, you follow this up with a formal test to see whether your data differs significantly from normal. What is this test? (answer)

New cards

One-Way ANOVA

You want to compare mean THC levels in 4 varieties of Cannabis. Y is THC, X is variety. All assumptions of normality, equal variance, and independence are met.

New cards

Tukey's HSD

In the previous question you find that mean THC levels were significantly different among varieties. Now you want to know which specific varieties are different from each other.

New cards

Dunnett's Test

Suppose one of the varieties above is a "wild" type and you just wanted to compare 3 modern cultivars versus the wild type.

New cards

Linearized log regression

You want to create a regression between weight of deer (y) and their age (x), two continuous variables. But the preliminary run produces this trend

New cards

Linearized polynomial regression

You want to graph the relationship between egg production (y) and age (x) in sturgeon, but the trend looks sort of like a hump. Eggs is a count, but each fish typically produces hundreds or more

New cards

Poisson

You are managing deer, and one goal is to provide quality bucks to hunters. Quality is measured as the number of points on the antlers. Most of the deer in herd are young 1-3 years old and have only 0-3 points. A few of the larger deer have up to 10 or 12 points. You want model the relationship between point count (y) and deer age (x1) and habitat (x2). The residual deviance: df ratio is about 1.3

New cards

Quasipoisson regression

Same as previous, but some of the older deer in this region have up to 18 points, and the residual deviance: df ratio is 1.7

New cards

You have a standard deviation that describes the variation of the measurements in your data, but you want to make inferences about the variation in means in the population

New cards

95% CI

With the same scenario as above, what if you want to make inferences about variation in means in the population and also estimate the range (high and low) that you expect to find the true mean with a probability of 0.95.

New cards

Smirnoff

You are interested in whether the age distributions of elephants in South Africa differ from those in Zambia. You plot out the age frequency histograms from both countries. The mean and median ages seem about the same, but the two distributions have different shapes. What test could you use to show that the age distributions of the two elephant populations are different, even though their means and median ages are not.

New cards

Anderson-Darling Test

In the previous Q, what if the differences in age distributions occurs mainly due to differences in the number of calves and the oldest animals. What alternative to the previous is more sensitive to differences at the "extremes" in the age-frequency histogram?

New cards

Spearman's rank correlation coefficient

You want a simple statistic/index that tells you the degree of association between aggressiveness in mice and size. Size is measured in grams. Aggressiveness is an index between 0-5, which is loosely based on the number of aggressive encounters per minute. What constitutes an aggressive encounter is not always obvious, so the researchers used the index instead.

New cards

Kendall's Tau

In the previous question, supposed the aggressiveness is based on a very subjective score of the intensity of the aggressive behavior. What more conservative version of the previous question would be used?

New cards

QQ plot

In complicated ANOVAs and also in regression, normality assumptions using classic exploration of the distribution of your data do not work very well. What tool is used to assess the global normality of residuals as a proxy for testing normality in various scenarios.

New cards

In ANOVA and regression, this metric tells you the % of variation in Y that is explained by X(s).

New cards

AICwt

If you are tryinging to fit a regression model and have many predictors (Xs) as well as 1-2 interaction terms involving your Xs, what metric would help you decide which X terms you should keep in the final model to explain as much variation in Y with the fewest predictors?

New cards

Partial Corr

You are looking at the degree of association between toxoplasmosis (parasite encysted in brain) and murder rates (per 100 K) in different cities. Toxoplasmosis is a rate—positive antigen tests per 100 people. However, you realize that there may be confounding variables, like the size of the city, and economic prosperity. If you want to look at the association between toxoplasmosis and murder rate, but control for city population size and mean household income, what test would you use?

New cards

Multiple Linear Regression

You want to model the functional relationship between the bird-tower collisions (counts are usually very high and not poisson-like) in relation to various predictors: (a) tower height, (b) elevation, (c) distance from city, (d) number of foggy days per year, and (e) mean annual windspeed. All x-y scatters point towards good linearity.

New cards

Independent sample t

You want to compare the mean height of white pines versus red pines in a forest. Random samples of each are collected in a forest

New cards

GLM binomial logistic regression

A research wants to model seed germination in response to seed weight (x). Each seed is weighed and then planted. The response is either yes (1) it germinated or (2) not germinated.

New cards

Binomial Trials/proportion

Suppose the researcher puts 10-20 seeds in 100 petri dishes. 10 different doses of a plant hormone are applied to each dish, 10 dishes for each dose. At the end of the experiment the researcher records the number of germinated and the number of un-germinated seeds

New cards

ANCOVA

A researcher wants to compare mean weight gain (y) of chicken in response to 4 different feeding treatments. They also record the pre-treatment weight of each chicken, because they know that the intial weight of the chicken may affect how they respond to the treatment (e.g. scrawny chickens may not recover enough to respond to the treatments in the same way as others). How would they compare means for the 4 treatments, controlling for pre-treatment chicken weight?

New cards

Paired t-test

Researcher want to compare mean PCB in lakes in 1980 vs today (two groups). The same lakes are used in 1980 and 2023.

New cards

Linear Mixed Model (LMM)

You want to compare mean dace densities in pools vs riffles. However, you take multiple samples from 10 different streams, each stream with at least 5 samples of pools and riffles. Because the data from each stream is not independent, you can't ignore that the data was collected in different locations. How would you compare dace densities, controlling for the location of samples (stream_ID)

New cards

2 way ANOVA

The response variable is hemoglobin levels. The predictor variables are both factors: X1=disease_status (sickle cell homozygous, sickle cell heterozygous, normal) and X2= gender (M, F). The goal is compare mean hemoglobin levels in response to disease status and gender, and also to consider the interaction between disease status and gender.

New cards

emmeans

You want a follow-up test to compare mean hemoglobin levels between pairwise combinations of each disease type and gender.

New cards

Bootstrap/Permutation t

You want to compare means of two groups, but both groups are definitely skewed, and to make it worse, they are skewed in different directions. You have samples sizes of 40 for each. What is the best option to compare the means without having to resort to a weaker test based on ranks?

New cards

coefficient of variation

You have basic summary statistics on the bill size of several species of Darwin's Finches on the Galapagos. You want to determine which species is most variable with respect to bill size. You realize that SD won't work, because the larger finch species will always have higher SD because their bills are larger, so their means and SD will also be large. How can you compare variation in bill size among birds in a way that doesn't depend on their size?

New cards

Wilcoxon Rank Sum

You want to compare the mean eye diameter in two strains of fruit flies, but you only have 8 observations for each strain and you are unsure of the distribution of eye sizes. For a small sample size, what conservative test based on ranks would allow you to compare the two groups?

New cards

Welch's ANOVA

You are running a one-way ANOVA but discover that the Levene F test gives you a P=0.000001. What is the best alternative (simplest) to a regular ANOVA in this case?

New cards

Games Howell

In the previous question, which post hoc applies for comparison

New cards

Simple Linear Regession

You want to draw a trendline and model the relationship between Cholera levels in drinking water and population size. The relationship looks quite linear.