1/22
These flashcards cover key terms and definitions related to simple linear regression, helping to consolidate knowledge for exam preparation.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Scatterplot
A graphical representation that shows the relationship between two numerical variables.
Predictor variable
The independent variable, plotted on the horizontal axis of a scatterplot.
Response variable
The dependent variable, plotted on the vertical axis of a scatterplot.
Strong relationship
A clear pattern of dependence between predictor and response variables, even in non-linear data.
Weak relationship
A lack of discernible pattern between data points in a scatterplot, making it difficult to identify relationships.
Correlation coefficient (r)
A measure that indicates the strength and direction of a linear relationship between two numerical variables, ranging from -1 to 1.
Line of best fit
A straight line that best represents the data on a scatterplot, used to make predictions.
Least squares method
A technique used to determine the best-fitting line by minimizing the sum of the squares of the residuals.
Linear model
A mathematical representation of the relationship between variables in a linear regression that can be used for prediction and inference.
Slope
The rate of change in the response variable for every one-unit increase in the predictor variable in a linear regression model.
Y-intercept
The predicted value of the response variable when the predictor variable is zero in a linear regression model.
Residual
The difference between the observed value and the predicted value in a regression analysis.
R-squared (R)
A statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.
Assumptions of linear regression
Includes linearity, constant variability (homoscedasticity), independent observations, and normally distributed residuals.
Outliers
Data points that differ significantly from other observations and may affect the results of regression analysis.
Influential point
A data point that significantly affects the slope and intercept of the regression line.
Statistical inference
The process of using data from a sample to make conclusions about a population, often involving hypothesis testing.
Confidence interval
A range of values derived from sample data that is likely to contain the population parameter with a certain level of confidence.
Prediction interval
A range of values that is likely to contain the value of a new observation based on the estimated regression line.
Extrapolation
The act of predicting values outside the range of the observed data, which can lead to unreliable predictions.
Categorical predictor variable
A predictor variable that contains categories rather than numerical values.
Multicollinearity
A situation in regression analysis where two or more predictor variables are highly correlated.
Multiple Linear Regression
A type of linear regression that models the relationship between two or more predictor variables and a single response variable.