AP Statistics | 2024-2025
What is univariate data?
a one-variable data set
What is bivariate data?
a data set that describes the relationship between 2 variables
What is a response variable?
the variable that measures the outcome of a study
What is an explanatory variable?
the variable that may help predict or explain changes in a response variable
What does a scatterplot show?
the relationship (association) between two quantitative variables measured on the same individuals
Why do we study relationships between 2 variables?
to help us explain how one variable affects another and why something happens
How do you make a scatterplot?
Label the axes
Scale the axes
Plot individual data values
How do you describe a scatterplot?
using DUFS – direction, unusual features, form and strength
What is the easiest way to lose points when making a scatterplot?
not labeling the axes
How do you know which variable to put on what axis?
the explanatory always goes on the x-axis, and the response variable goes on the y-axis; if there is no explanatory variable, either variable can go on the x-axis
Where do you start each axis of a scatterplot?
at a number smaller than the smallest value of that variable
When do 2 variables have a positive association?
when above average values of one variable tend to accompany above average values of the other variable
When do 2 variables have a negative association?
when above average values of one variable tend to accompany below average values of the other variable
When do 2 variables have no association?
if knowing one variable does not help us predict the value of the other variable
How do you describe direction of a scatterplot?
positive association, negative association, or no association
How do you describe form of a scatterplot?
linear or nonlinear
How do you describe strength of a scatterplot?
weak, moderate, or strong
How do you describe unusual features of a scatterplot?
outliers or clusters
What does correlation r measure?
the direction and strength of the association of the linear relationship between two quantitative variables
What interval does r always fall between?
-1 to 1
What does r > 0 indicate?
a positive association
What does r < 0 indicate?
a negative association
What only occurs in the case of a perfect linear relationship?
the extreme values r = -1 and r = 1
What do values of r close to -1 or 1 indicate?
a very strong relationship
What do values of r close to 0 indicate?
a very weak relationship
What does correlation not imply?
causation
What type of relationships should correlation only be used on?
linear
True or false: Correlation is not a resistant measure of strength
true
True or false: You can determine the form of a relationship using only correlation
false
How do you find r on a calculator?
after entering the values in the lists, press STAT → CALC → 8: LinReg(a+bx) → Calculate
What does correlation require of both variables?
that they be quantitative
What does correlation make no distinction between?
explanatory and response variables
True or false: r does not change when change the units of measurement of x, y, or both
true
What is a regression line?
a line that describes how a response variable (y) changes as an explanatory variable (x) changes
What form are regression lines expressed in?
ŷ = a + bx
What is the regression line used to predict?
the value of y for a given value of x
What is extrapolation?
the use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line
Why is extrapolation dangerous?
there is no guarantee the linear pattern we see will continue beyond the given data
True or false: The regression line will pass exactly through all the points in a scatterplot
false
What is a residual?
the difference between an observed value of the response variable and the value of y predicted by the regression line
What is the equation to find a residual?
y - ŷ
How do you interpret a residual?
give the size and direction of the residual
The actual value of [response variable] is [residual value] more/less than the value predicted by the regression line with x = [explanatory variable]
What does a represent in the regression line equation ŷ = a + bx?
the y-intercept, the predicted value of y when x = 0
What does b represent in the regression line equation ŷ = a + bx?
the slope, the amount by which the predicted value of y changes when x increases by 1 unit
How do you interpret slope?
The predicted value of [response variable] goes up/down by b for each [unit of x].
How do you interpret the y-intercept?
The predicted value of a [individual] that has 0 [unit of x] is [y-intercept]
What regression line do we want?
the one that minimizes the sum of the squared residuals
What is the least-squares regression line?
the line that makes the sum of the squared residuals as small as possible
What point is always guaranteed to be on the least squares regression line?
(x̄, ȳ)
How do outliers affect the least squares regression line?
they strongly influence the line
What is a residual plot?
a scatterplot of the residuals on the vertical axis and the explanatory variable on the horizontal axis
How do you find the residual plot on a calculator?
2nd → y= → Plot 1 → Enter → Adjust settings → Zoom → 9: ZoomStat → Enter
How does a residual plot work?
it magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns
What is the purpose of a residual plot?
to assess linearity with a tool other than the actual scatterplot
What do you look for in a residual plot?
random scattered points above and below the regression line
How can you tell if a linear model is appropriate?
if there are no obvious patterns
What does the standard deviation of the residuals?
s, which gives the typical size of a prediction error (residual)
How do you calculate the standard deviation of the residuals?
2nd → STAT -> Math → 7: stdDev( → 2nd → STAT → RESID
How do you interpret the standard deviation of the residuals?
Using the LRSL that predicts [y] using [x], we will typically be off by about “s” units in our predictions
What is the coefficient of determination?
r2, which measures the percent reduction in the sum of squared residuals when using the least-squares regression line to make predictions, rather than the mean value of y
How do you calculate r2?
STAT → CALC → 8: LinReg(a+bx)
How do you interpret r2?
[r2 as a percentage]% of variation in [y variable] is accounted for by the least squares regression line with x = [x variable]
What are points with high leverage in regression?
points that have much larger or much smaller x-values than the other points in the data set
What is an outlier in regression?
a point that does not follow the pattern of the data and has a large residual
What is an influential point in regression?
any point, that if removed, substantially changes the slope, y-intercept, correlation, coefficient of determination, or standard deviation of the residuals