1/17
These flashcards cover key vocabulary and concepts related to data science, particularly focused on regression analysis and model evaluation.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Data Science Process
A systematic approach for collecting, structuring, and analyzing data to gain insights.
Linear Regression Model
A statistical method to model the relationship between a dependent variable and one or more explanatory variables by fitting a linear equation.
Coefficients
Values that represent the relationship between independent variables and the dependent variable in regression models.
Residuals
The differences between observed and predicted values in a regression model.
Sum of Squares Error (SSE)
A measure of the total deviation of the response values from the fit to the response values, indicating unexplained variation.
Adjusted R-squared
A modified version of R-squared that adjusts for the number of predictors in a model, preventing overfitting.
Dummy Variables
Binary variables created to represent categories of a qualitative variable, used in regression models.
Least Squares Method
A statistical technique used to estimate the parameters of a linear regression model by minimizing the sum of squared residuals.
Prediction Interval
An estimate of the range in which new observations are expected to fall, given a certain probability.
Confidence Interval
An estimate of the range in which the true mean of the dependent variable is expected to fall, given a certain probability.
Outlier
An observation that deviates significantly from the other data points, which may indicate an error or an unusual occurrence.
Nonlinear Regression
A form of regression analysis in which data fit a model described by a nonlinear equation.
Exploratory Data Analysis (EDA)
The process of analyzing data sets to summarize their main characteristics, often visualizing them to gain insights.
Model Fit
A term that refers to how well a regression line approximates the real data points.
Variance Inflation Factor (VIF)
A measure of how much the variance of a regression-coefficient estimate is increased due to multicollinearity.
Homoscedasticity
A characteristic of a dataset in which the variance of the errors is constant across all levels of the independent variable.
Heteroscedasticity
A condition in which the variance of errors differs across levels of the independent variable.
P-value
The probability that the observed results would occur by chance if the null hypothesis were true, helping to determine statistical significance.