1/35
Vocabulary flashcards covering Units 1 through 9 of the Statistics curriculum, including data distributions, regression, sampling, probability, and inference.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Standard deviation
The measure from center used when a distribution is symmetric; it is non-resistant and greatly affected by outliers.
Joint relative frequency
Calculated as the value in a specific cell divided by the total.
Conditional relative frequency
Calculated as the value in a specific cell divided by the row total.
Marginal relative frequency
Calculated as the row total divided by the grand total.
Bimodal
A characteristic of a graph that has two separate peaks.
Median (Q2)
The middle number of a data set when numbers are lined up from least to greatest.
Five number summary
The first step to create a box plot; includes the minimum, q1, median, q3, and maximum.
Interquartile range (IQR)
Shows where 50% of the data set lies, calculated as q3−q1.
SOCCS
An acronym for describing distributions: Shape, Outliers, Center, Context, and Spread.
Outlier rule (Skewed)
Any value less than q1−1.5(IQR) or greater than q3+1.5(IQR); identified as a star on a dot plot.
Outlier rule (Symmetric)
Any value more than 2 standard deviations away from the mean.
Response variable
Also known as the dependent variable or the y value on a scatter plot; it represents the result.
Explanatory variable
Also known as the independent variable, factor, treatment, or the x value on a scatter plot.
Percentile
The P percentile is the value that has p% of the data less than or equal to it.
Z-score
A measures of a data value's distance from the mean in standard deviations, calculated as z=σx−μ.
Empirical rule (68-95-99.7 rule)
In a normal distribution, 68% of data is within 1σ, 95% is within 2σ, and 99.7% is within 3σ.
Correlation coefficient (r)
A value between 1 and −1 that describes the strength and direction of the relationship between two variables.
CDUFS
Acronym for describing scatter plots: Context, Direction (positive/negative/neutral), Unusual features (outliers/clusters), Form (linear/non-linear), and Strength.
Linear regression line (LSRL)
Also known as the line of best fit; it is the line that most closely matches the linear relationship and represents the average slope of the data.
Coefficient of determination (r2)
The variation in y explained by the linear relationship of x; indicates the percentage of data explained by the linear line.
Residuals
The difference between the actual value and the predicted value, calculated as actual y−predicted y.
Extrapolation
Predicting a data point that is far away from the rest of the data, making the model less reliable.
High leverage points
Points with very large or very small x values compared to the rest of the data.
Influential points
Points that, if removed, significantly change the slope or y-intercept of the regression line.
Bias
An over or underestimation of a population characteristic.
Simple Random Sample (SRS)
A sampling method where every individual has an equal chance of being chosen.
Confounding variable
A variable not accounted for that can influence the response variable and is related to the explanatory variable.
Law of large numbers
States that simulated probabilities tend to get closer to the true probability as the number of trials increases.
Statistically significant
A result that is unlikely to occur by chance alone, typically defined as having a probability of less than 5%.
Mutually exclusive
Also known as disjoint events; two events where the outcome of one does not affect the outcome of the other, and they cannot occur simultaneously.
Central limit Theorem
As sample size grows, the sampling distribution of the mean becomes more normal regardless of the population's shape.
Confidence intervals (CI)
A range of believable values where the true parameter lies, found by point estimate±margin of error.
Type 1 error (α)
A false positive; rejecting the null hypothesis (H0) when it is actually true.
Type 2 error (β)
A false negative; failing to reject the null hypothesis (H0) when it is actually false.
Power (1−β)
The probability of correctly rejecting a false null hypothesis in favor of a specific alternative.
Chi-square statistics (χ2)
A type of statistic that measures the difference between observed and expected frequencies in categorical data.