1/25
These flashcards cover key concepts and vocabulary related to multiple linear regression, fundamental for understanding the material focused on in Chapter 7.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Multiple Linear Regression
A statistical technique that models the relationship between a dependent variable and multiple independent variables.
Explanatory Variable
An independent variable that is used to explain variations in the dependent variable.
Response Variable
The dependent variable that researchers are trying to predict or explain.
Confounding Variable
An outside influence that changes the effect of a dependent and independent variable.
ANOVA (Analysis of Variance)
A statistical method used to test differences between two or more group means.
Interaction in Regression
Occurs when the effect of one independent variable on the dependent variable changes depending on the level of another independent variable.
Residuals
The differences between observed and predicted values in a regression model.
R-squared (R)
A statistical measure that represents the proportion of variance for the dependent variable that's explained by the independent variables.
Adjusted R-squared
A modified version of R-squared that adjusts for the number of predictors in the model, providing a more accurate measure for multiple regression.
Hypothesis Testing in Regression
A method used to determine whether there is enough statistical evidence in a sample to infer that a certain condition holds true for the entire population.
Statistical Significance
A determination that the observed data would be very unlikely under the null hypothesis.
Model Selection
The process of selecting a statistical model from a set of candidate models based on how well they explain the data.
Categorical Predictors
Variables that represent categories, often used in regression to assess the impacts of different groups.
Least Squares Estimation
A method to estimate the coefficients of a linear regression model which minimizes the sum of the squares of the residuals.
Multicollinearity
A statistical phenomenon where two or more predictors in a model are highly correlated, leading to unreliable estimates.
Standard Error of Coefficient
An estimate of the variability of a regression coefficient, indicating how much the coefficient varies from the actual average value.
F-statistic
A ratio used in ANOVA and regression analysis to compare model fits, indicating the overall quality of the model.
Prediction Interval
An estimate of how far observed outcomes deviate from predicted values in a regression model.
Confidence Interval
A range of values derived from a data set that is likely to contain the true value of an unknown population parameter.
Assumptions of Multiple Linear Regression
Key conditions that should be met for the model to be valid: linearity, independence of residuals, normality of residuals, and homoscedasticity.
Dummy Variables
Binary variables (0 or 1) used to represent categorical predictors with more than two levels in a regression model.
Heteroscedasticity
A violation of regression assumptions where the variance of the residuals is not constant across all levels of the independent variables.
Variance Inflation Factor (VIF)
A measure used to detect multicollinearity by quantifying how much the variance of an estimated regression coefficient is inflated due to correlation with other predictors.
P-values in Hypothesis Test
P < 0.05 : reject H0
P > 0.05 : fail to reject H0
Residual Confounders
Possible predictors that may be confounders but have not been examined - can be other variables in a dataset that have not been examined or measured.
Scatterplot Matrix
Useful for visualizing the relationship between the predictor and response variables, as well as the relationship between predictors