1/260
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is statistics?
The process of collecting, analyzing, and presenting data.
What are descriptive statistics?
Methods that summarize collected data using graphs and calculations.
What are inferential statistics?
Methods that use sample data to make conclusions or predictions about a larger population.
What procedure should you use for one quantitative variable?
Mean, median, range, quartiles, IQR, standard deviation, histogram, dot plot, or boxplot.
What procedure should you use for one categorical variable?
Frequency table, relative frequency, bar chart, or pie chart.
What procedure should you use for two categorical variables?
Two-way table, conditional probability, independence/dependence, or segmented bar chart.
What procedure should you use for two quantitative variables?
Scatterplot, correlation, regression, prediction, residuals, r, or r^2.
What procedure should you use when a problem says success/failure with fixed n?
Binomial distribution.
What procedure should you use when a continuous variable is normal with mean and SD?
Normal distribution, z-score, NORM.DIST, or NORM.INV.
What procedure should you use to estimate a population mean or proportion?
Confidence interval.
What procedure should you use to test a claim about a population mean or proportion?
Hypothesis test.
Fastest way to avoid wrong answers
Before calculating, identify the data type and the goal of the problem.
What is a variable?
A characteristic, number, or quantity that can be measured or counted.
Quantitative variable
A numeric variable where an average makes sense, such as age, GPA, weight, or exam score.
Categorical variable
A group, quality, or label where an average usually does not make sense, such as color, major, or yes/no.
Discrete quantitative variable
A countable/listable integer-valued variable, such as number of siblings or number correct.
Continuous quantitative variable
A variable that can take values in a range, including decimals, such as height, weight, time, or temperature.
Nominal categorical variable
Categories with no natural order, such as colors, yes/no, or dog/cat/bird.
Ordinal categorical variable
Categories with a meaningful order, such as low/medium/high or first/second/third.
Why are ZIP codes and ID numbers usually not quantitative?
They are labels, not measurements where an average makes sense.
Mean formula
Mean = sum of values / number of values.
Excel command for mean
=AVERAGE(range)
Median
The middle value after sorting; the 50th percentile.
Excel command for median
=MEDIAN(range)
When should you use the median instead of the mean?
When the distribution has outliers or is strongly skewed.
What happens if the mean is higher than the median?
High values/outliers are pulling the mean up; the distribution may be right-skewed.
What happens if the mean is lower than the median?
Low values/outliers are pulling the mean down; the distribution may be left-skewed.
Range formula
Range = maximum - minimum.
Excel command for range
=MAX(range)-MIN(range)
Q1 meaning
The first quartile; about 25% of data are at or below Q1.
Q3 meaning
The third quartile; about 75% of data are at or below Q3.
Excel command for Q1
=QUARTILE.INC(range,1)
Excel command for Q3
=QUARTILE.INC(range,3)
IQR formula
IQR = Q3 - Q1.
What does IQR measure?
The spread of the middle 50% of the data.
Lower outlier fence formula
Lower fence = Q1 - 1.5(IQR).
Upper outlier fence formula
Upper fence = Q3 + 1.5(IQR).
Outlier rule
Any value below the lower fence or above the upper fence is considered an outlier.
Standard deviation meaning
The typical/average distance values are from the mean; larger SD means more spread out.
Excel command for population standard deviation
=STDEV.P(range)
Excel command for sample standard deviation
=STDEV.S(range)
What is the standard deviation if all values are identical?
0, because every value is exactly 0 away from the mean.
Dot plot is best for…
Small-to-medium quantitative data sets where individual values should be visible.
Histogram is best for…
Quantitative data grouped into intervals; good for shape, center, spread, gaps, and skew.
Boxplot is best for…
Quantitative data, especially when comparing groups using the five-number summary.
What does a boxplot show?
Minimum, Q1, median, Q3, maximum, and possible outliers.
Segmented bar chart is best for…
Comparing proportions for two categorical variables.
When describing a distribution, mention…
Shape, center, spread, and outliers; also clusters or gaps if present.
Right-skewed distribution
Long tail to the right; mean is usually greater than median.
Left-skewed distribution
Long tail to the left; mean is usually less than median.
When comparing two distributions, compare…
Center, spread, shape, and outliers - not only averages.
Frequency
A count of how many times a category occurs.
Relative frequency formula
Relative frequency = count / total.
What does a two-way table show?
Counts or proportions for two categorical variables at the same time.
What words signal conditional probability?
Given, among, of those, or within.
Conditional proportion formula from a table
Category count within the given group / total of the given group.
Denominator for a one-event probability
Usually the grand total.
Denominator for a conditional probability
The total for the GIVEN group.
Why use percentages instead of raw counts in segmented bar charts?
Percentages allow fair comparison when group sizes are different.
Bivariate data
Data with two variables.
In regression/correlation, what kind of variables are x and y?
Both variables should be quantitative.
Explanatory or predictor variable
The x variable; the variable used to make a prediction.
Response variable
The y variable; the variable being predicted or explained.
Why look at the scatterplot first?
To check form, direction, strength, and unusual points before calculating correlation/regression.
Scatterplot form options
Linear, curvilinear, or no relationship.
Scatterplot direction options
Positive, negative, or none.
Scatterplot strength options
Strong, moderate, or weak.
Positive association
As x increases, y tends to increase.
Negative association
As x increases, y tends to decrease.
Correlation r
Measures strength and direction of a linear relationship; ranges from -1 to 1.
Excel command for correlation
=CORREL(x_range,y_range)
What does r close to 1 mean?
A strong positive linear relationship.
What does r close to -1 mean?
A strong negative linear relationship.
What does r close to 0 mean?
Little or no linear relationship.
Correlation caution
Correlation does not prove causation; lurking/confounding variables may explain the association.
Coefficient of determination r^2
The percent of variation in y described by the regression line using x.
How do you interpret r^2 = 0.588?
58.8% of the variation in y is described by the regression line using x.
Regression equation form
y-hat = a + bx, where a is the intercept and b is the slope.
What is y-hat?
The predicted value of y.
Slope interpretation
For every 1-unit increase in x, the predicted y changes by b units.
Positive slope
Predicted y increases as x increases.
Negative slope
Predicted y decreases as x increases.
Y-intercept interpretation
When x = 0, the predicted y is the intercept value; only interpret if x = 0 makes sense in context.
How do you make a regression prediction?
Plug the given x-value into the regression equation and solve for y-hat.
Interpolation
Predicting within the range of observed x-values; usually more reliable.
Extrapolation
Predicting outside the range of observed x-values; riskier and less reliable.
Regression residual formula
Residual = actual y - predicted y.
Positive residual
Actual y is above the regression line; the model underestimated y.
Negative residual
Actual y is below the regression line; the model overestimated y.
Regression outlier
A point with a large vertical gap from the line of best fit.
Can outliers simply be removed?
No. They must be investigated to determine whether they are valid data.
Probability
A long-run value: as trials increase, the proportion of an outcome tends to stabilize.
P(A)
The probability that event A occurs.
Complement of A
The event that A does not occur, written A′ or Ac.
Complement rule
P(A′) = 1 - P(A).
Intersection
A and B; the outcomes where both events occur.
Union
A or B; the outcomes where A occurs, B occurs, or both occur.
Union formula
P(A or B) = P(A) + P(B) - P(A and B).
Conditional probability formula
P(A | B) = P(A and B) / P(B).
What does the vertical bar | mean in probability?
Given.