1/32
Comprehensive flashcards covering distribution of means, t-tests, ANOVA, non-parametric methods, correlation, and various regression models based on Chapters 11-13 and 15-17.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Distribution of sample means
The distribution result obtained by taking a sample from the population, calculating its sample mean, repeating the process many times, and plotting all the resulting sample means.
Standard error (SE)
The standard deviation of the distribution of sample means, calculated as SE = rac{ ext{σ}}{ ext{√}n}.
Student’s t distribution
A distribution used when the true population standard deviation is unknown, characterized by fatter tails than the standard normal distribution and defined by n−1 degrees of freedom.
95% Confidence interval for the mean
Calculated using the formula ar{Y} - t_{0.05(2),df} SE_{ar{Y}} < ext{μ} < ar{Y} + t_{0.05(2),df} SE_{ar{Y}}, providing a range consistent with the population mean.
One-sample t-test
A test that compares the mean of a random sample from a normal population with a proposed population mean (extμ0) specified in a null hypothesis.
Type II error (extβ)
The error that occurs when a researcher fails to reject a false null hypothesis.
Paired design
A study design where every sampled unit receives both treatments, allowing two measurements from the same unit and usually increasing statistical power by controlling for variation among subjects.
Two-sample design
A study design where each treatment group is composed of an independent, random sample of units.
Pooled sample variance (sp2)
The average of the variances of two samples weighted by their degrees of freedom, calculated as s_p^2 = rac{df_1 s_1^2 + df_2 s_2^2}{df_1 + df_2}.
Welch's t-test
A version of the two-sample t-test used to compare means when the variances of the two independent groups are not equal.
Nonparametric method
A statistical method that makes fewer assumptions about the distribution of variables than parametric methods and is often based on ranks of data points rather than actual values.
Sign test
A nonparametric test that compares the median of a sample to a constant specified in the null hypothesis, making no assumptions about the distribution of the population.
Mann-Whitney U-test
A nonparametric alternative to the two-sample t-test used to compare the distributions of two independent groups based on ranks.
Permutation test
A computer-based nonparametric method that tests hypotheses by randomly rearranging ("permuting") data thousands of times to generate a null distribution.
ANOVA (Analysis of variance)
A method used to compare the means of multiple groups simultaneously by testing for variation among group means relative to variation within groups.
Mean square error (MSerror)
In ANOVA, this value estimates the variance among subjects that belong to the same group (variation within groups).
F-ratio
The test statistic for ANOVA, calculated as F = rac{MS_{groups}}{MS_{error}}, which should be approximately 1 if the null hypothesis is true.
R2 (Variation Explained)
Measures the fraction of variation in Y that is explained by group differences, calculated as R^2 = rac{SS_{groups}}{SS_{total}}.
Planned comparison
A comparison between specific means identified during the design of the study before the data are examined.
Tukey-Kramer method
An unplanned comparison procedure that tests all pairs of means while keeping the probability of making at least one Type I error at or below the significance level extα.
Pearson’s correlation coefficient (r)
A statistic that measures the strength and direction of the linear association between two numerical variables, ranging from −1 to 1.
Bivariate normal distribution
A distribution that is bell-shaped in two dimensions, where both variables are normal, their relationship is linear, and the cloud of points is elliptical or circular.
Spearman’s rank correlation
Measures the strength and direction of the linear association between the ranks of two variables, used for ordinal data or when bivariate normality is violated.
Least-squares regression
A linear regression method that finds the line where the sum of all the squared deviations in the response variable (Y) is smallest.
Regression slope (b)
The rate of change in Y per unit of X in a linear regression model, calculated as b = rac{ ext{∑}(X_i - ar{X})(Y_i - ar{Y})}{ ext{∑}(X_i - ar{X})^2}.
Residual
The difference between the measured value of Y and the value of Y predicted by the regression line (Yi−extY^i).
Extrapolation
The prediction of a response variable value outside the range of explanatory variable values (X) present in the original data.
Regression toward the mean
A result seen when correlated variables have a correlation less than one, leading individuals far from the mean in one measurement to lie closer to the mean in the second measurement.
Logistic regression
A regression method that predicts the probability of occurrence of a binary response variable (coded as 0 or 1) as a function of a continuous numerical explanatory variable.
LD50 (Lethal Dose 50)
In a regression curve, the estimated dose of a treatment predicting 50 ext{%} mortality.
Log-binomial regression
A regression model used for binary outcomes that directly estimates risk ratios (RR).
Multinomial logistic regression
A model used when the outcome variable is nominal (categorical) with more than two outcome categories, comparing them to a reference group.
Cox regression
A time-to-event regression model used in survival analysis where the measure of association is the Hazard Ratio (HR).