1/24
Flashcards covering key vocabulary from introductory statistics.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Categorical Variable
Places an individual into a group or category (e.g., eye color, genre of music).
Discrete Quantitative Variable
Takes specific numerical values with gaps in-between (e.g., number of students in a class, shoe size).
Continuous Quantitative Variable
Takes any decimal value over a certain range of values (e.g., height, weight).
Mean
The sum of the data divided by the number of observations; a measure of center.
Median
The value with ≤ 50% of the other data below it and ≤ 50% above it; a measure of center.
Range
Maximum value minus minimum value; a measure of variability/spread.
Interquartile Range (IQR)
Q3 minus Q1; a measure of variability/spread.
Outlier
Any observation greater than Q3 + (1.5 * IQR) or less than Q1 – (1.5 * IQR).
Standard Deviation
Approximately how much, on average/typically, the values vary from the mean; a measure of spread/variability.
Variance
The square of standard deviation; a measure of spread/variability.
Percentile
The value in a distribution with that percent of the observations less than it.
Z-Score
Measures how many standard deviations above or below the mean a specific observation is.
Z-Distribution
The normal distribution with a mean of 0 and a standard deviation of 1.
Explanatory Variable
May help explain or predict changes in a response variable; the x-variable or “input”.
Response Variable
Measures the outcome of a study; the y-variable or “output.”
Scatterplot
Shows the relationship between two quantitative variables measured on the same individuals.
Correlation Coefficient (r)
Measures the strength and direction of the linear relationship between two quantitative variables; a number between -1 and 1.
Regression Line
A line that describes how a response variable y changes as an explanatory variable x changes; used to predict y from x in a scatterplot.
Residual
The difference between an observed y value from the actual data and a predicted y value from the regression line; prediction errors of the regression line.
Least-Squares Regression Line
The line that minimizes the sum of the squared residuals.
Residual Plot
Scatterplot with the residuals on the y-axis instead of the response variable; used to check the appropriateness of a linear model.
Standard Deviation of Residuals (s)
The approximate average size of the residuals or prediction errors; indicates how well the regression line fits the data.
Coefficient of Determination (r^2)
The proportion of the variation in the values of y that is explained by the least-squares regression line of y vs. x.
Influential Point
Any point which significantly affects slope, y-intercept, correlation coefficient r, standard deviation of residuals s, or coefficient of determination r^2 of the linear regression.
High-Leverage Point
Any point with an x value that is significantly above or below the rest of the x values in the data set.