Statistics and Biological Data Analysis Review

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/32

Earn XP

Description and Tags

Comprehensive flashcards covering distribution of means, t-tests, ANOVA, non-parametric methods, correlation, and various regression models based on Chapters 11-13 and 15-17.

Last updated 8:15 PM on 5/8/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

33 Terms

New cards

Distribution of sample means

The distribution result obtained by taking a sample from the population, calculating its sample mean, repeating the process many times, and plotting all the resulting sample means.

New cards

Standard error (SE)

The standard deviation of the distribution of sample means, calculated as SE = rac{ ext{σ}}{ ext{√}n}.

New cards

Student’s $t$ distribution

A distribution used when the true population standard deviation is unknown, characterized by fatter tails than the standard normal distribution and defined by $n - 1$ degrees of freedom.

New cards

95% Confidence interval for the mean

Calculated using the formula ar{Y} - t_{0.05(2),df} SE_{ar{Y}} < ext{μ} < ar{Y} + t_{0.05(2),df} SE_{ar{Y}}, providing a range consistent with the population mean.

New cards

One-sample t-test

A test that compares the mean of a random sample from a normal population with a proposed population mean ( $ext{μ}_0$ ) specified in a null hypothesis.

New cards

Type II error ( $ext{β}$ )

The error that occurs when a researcher fails to reject a false null hypothesis.

New cards

Paired design

A study design where every sampled unit receives both treatments, allowing two measurements from the same unit and usually increasing statistical power by controlling for variation among subjects.

New cards

Two-sample design

A study design where each treatment group is composed of an independent, random sample of units.

New cards

Pooled sample variance ( $s_p^2$ )

The average of the variances of two samples weighted by their degrees of freedom, calculated as s_p^2 = rac{df_1 s_1^2 + df_2 s_2^2}{df_1 + df_2}.

New cards

Welch's t-test

A version of the two-sample t-test used to compare means when the variances of the two independent groups are not equal.

New cards

Nonparametric method

A statistical method that makes fewer assumptions about the distribution of variables than parametric methods and is often based on ranks of data points rather than actual values.

New cards

Sign test

A nonparametric test that compares the median of a sample to a constant specified in the null hypothesis, making no assumptions about the distribution of the population.

New cards

Mann-Whitney U-test

A nonparametric alternative to the two-sample t-test used to compare the distributions of two independent groups based on ranks.

New cards

Permutation test

A computer-based nonparametric method that tests hypotheses by randomly rearranging ("permuting") data thousands of times to generate a null distribution.

New cards

ANOVA (Analysis of variance)

A method used to compare the means of multiple groups simultaneously by testing for variation among group means relative to variation within groups.

New cards

Mean square error ( $MS_{error}$ )

In ANOVA, this value estimates the variance among subjects that belong to the same group (variation within groups).

New cards

$F$ -ratio

The test statistic for ANOVA, calculated as F = rac{MS_{groups}}{MS_{error}}, which should be approximately 1 if the null hypothesis is true.

New cards

$R^2$ (Variation Explained)

Measures the fraction of variation in $Y$ that is explained by group differences, calculated as R^2 = rac{SS_{groups}}{SS_{total}}.

New cards

Planned comparison

A comparison between specific means identified during the design of the study before the data are examined.

New cards

Tukey-Kramer method

An unplanned comparison procedure that tests all pairs of means while keeping the probability of making at least one Type I error at or below the significance level $ext{α}$ .

New cards

Pearson’s correlation coefficient ( $r$ )

A statistic that measures the strength and direction of the linear association between two numerical variables, ranging from $-1$ to $1$ .

New cards

Bivariate normal distribution

A distribution that is bell-shaped in two dimensions, where both variables are normal, their relationship is linear, and the cloud of points is elliptical or circular.

New cards

Spearman’s rank correlation

Measures the strength and direction of the linear association between the ranks of two variables, used for ordinal data or when bivariate normality is violated.

New cards

Least-squares regression

A linear regression method that finds the line where the sum of all the squared deviations in the response variable ( $Y$ ) is smallest.

New cards

Regression slope ( $b$ )

The rate of change in $Y$ per unit of $X$ in a linear regression model, calculated as b = rac{ ext{∑}(X_i - ar{X})(Y_i - ar{Y})}{ ext{∑}(X_i - ar{X})^2}.

New cards

Residual

The difference between the measured value of $Y$ and the value of $Y$ predicted by the regression line ( $Y_i - ext{\hat{Y}}_i$ ).

New cards

Extrapolation

The prediction of a response variable value outside the range of explanatory variable values ( $X$ ) present in the original data.

New cards

Regression toward the mean

A result seen when correlated variables have a correlation less than one, leading individuals far from the mean in one measurement to lie closer to the mean in the second measurement.

New cards

Logistic regression

A regression method that predicts the probability of occurrence of a binary response variable (coded as $0$ or $1$ ) as a function of a continuous numerical explanatory variable.

New cards

$LD_{50}$ (Lethal Dose 50)

In a regression curve, the estimated dose of a treatment predicting 50 ext{%} mortality.

New cards

Log-binomial regression

A regression model used for binary outcomes that directly estimates risk ratios ( $RR$ ).

New cards

Multinomial logistic regression

A model used when the outcome variable is nominal (categorical) with more than two outcome categories, comparing them to a reference group.

New cards

Cox regression

A time-to-event regression model used in survival analysis where the measure of association is the Hazard Ratio ( $HR$ ).