Names for y-variable
Response Variable, Output Variable, Dependent Variable
Names for x-variable
Predictor, Explanatory Variable, Input Variable, Independent Variable
What type of graph should be used for depicting the relationship between two quantitative, continuous variables?
Scatterplot
What must be captured in describing a scatterplot?
Direction, Form, Strength, Unusual Features
Correlation
Measure of linear dependence between two continuous quantitative variables.
What does correlation measure?
Strength and Direction for a linear relationship.
5 Properties for the linear correlation coefficient r
r is between -1 and 1.
r measures the strength of a linear relationship.
r is very sensitive to outliers.
r is unitless, so a different scale for a variable does not change r.
r is not affected by the choice of x or y. x and y are interchangeable.
Regression
Statistical method of estimating the value of a variable from another variable or variables.
Regression Equation
Equation of best fit line for a scatterplot using regression
Simple Linear Regression
Regression when done with one independent variable and the regression equation is linear.
How many y values are there per x value on a scatterplot?
A distribution of values, so no specific number.
Least-Squares Regression
Method that estimates slope and y-intercept to yield the best fit line for a scatterplot.
What range of data is safe to use with a regression
Only the given range of x-values. The farther you go from this range, the less accurate the prediction will be.
Residual
The difference between the observed y value and the predicted y value.
Residual less than 0 means that
the predicted value is an overestimate
Residual greater than 0 means that
the predicted value is an underestimate.
How do you verify that a linear regression model is appropriate for a data set?
Check if
The scatterplot suggests a positive or negative linear relationship
The distribution of residuals is approximately normal with a mean of 0
The residual plot contains no obvious pattern
Smaller residuals mean what about the model?
The model is a better fit.
r²
The amount of total variation explained by the regression line.