1/75
A set of 100 vocabulary flashcards covering key concepts in linear regression and business analytics.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Simple Linear Regression
A statistical method that models the relationship between a dependent variable and one independent variable.
Dependent Variable
The variable being predicted in a regression analysis.
Independent Variable
Variables used to predict the value of the dependent variable.
Multiple Linear Regression
A regression analysis that involves two or more independent variables.
Error Term
The part of the dependent variable that cannot be explained by the independent variables in a model.
Coefficient of Determination (r²)
A statistical measure that explains the proportion of variance in the dependent variable that can be explained by the independent variable(s).
Least Squares Method
A statistical technique used to determine the best-fitting line or model by minimizing the sum of the squares of the residuals.
Residuals
The differences between the observed values and the predicted values in a regression analysis.
Goodness of Fit
A measure of how well a statistical model fits the data.
Extrapolation
The act of estimating values outside the range of the data used to fit the model.
Dummy Variable
A numerical variable used in regression analysis to represent categorical data.
Interaction Term
A variable that represents the interaction between two or more independent variables in regression analysis.
Quadratic Regression
A form of regression analysis in which the relationship between the independent variable and the dependent variable is modeled as a second degree polynomial.
Piecewise Linear Regression
A regression method that models different linear relationships for different segments of data.
Stepwise Regression
An iterative method for selecting independent variables in a regression model by adding or subtracting predictors based on specified criteria.
Predictive Accuracy
The degree to which a regression model accurately predicts values and outcomes.
Statistical Independence
Condition in which the probability of one event occurring does not affect the probability of another event occurring.
Confidence Intervals
Range of values that is likely to contain the true parameter of the model, expressed at a certain confidence level.
Prediction Intervals
Range of values that predicts the value of a new observation based on the regression model.
Multicollinearity
A phenomenon in multiple regression where independent variables are highly correlated, making it difficult to determine the individual effect of each variable.
Outlier
An observation in a dataset that is distant from other observations, potentially influencing the results of the analysis.
ANOVA
Analysis of variance; a statistical method used to compare the means of three or more samples.
Scatter Plot
A graphical representation of the relationship between two quantitative variables.
Residual Plot
A plot that displays residuals on the vertical axis and fitted values on the horizontal axis, used to assess the fit of a model.
Statistical Software
Computer programs that perform statistical analysis.
Intercept
The predicted value of the dependent variable when all independent variables are equal to zero.
Slope
The change in the dependent variable associated with a one-unit increase in an independent variable.
Effect Size
A quantitative measure of the magnitude of the difference or relationship in a dataset.
Regularization
A technique used in regression that adds a penalty to the loss function to avoid overfitting.
F-Test
A statistical test used to determine if the variances of two populations are equal.
Standard Error
An estimate of the standard deviation of the sampling distribution of a statistic.
Variance Inflation Factor (VIF)
A measure used to detect the severity of multicollinearity in regression analysis.
Homoscedasticity
The assumption that the variance of errors is constant across all levels of the independent variable.
Normal Distribution
A probability distribution that is symmetric around the mean, describing a bell-shaped curve.
Sample Size
The number of observations or data points used in a study.
Hypothesis Testing
A statistical procedure that uses sample data to evaluate a hypothesis about a population parameter.
Parametric Tests
Statistical tests that assume a specific distribution for the population from which the sample is drawn.
Nonparametric Tests
Statistical tests that do not assume a specific distribution for the population.
Bootstrap Method
A resampling technique used to estimate statistics on a dataset by sampling with replacement.
Hierarchical Models
Statistical models that incorporate multiple levels of analysis.
Bootstrap Confidence Interval
A method for calculating confidence intervals using resampling techniques.
Model Specification
The process of developing a regression model based on the theoretical framework and the data.
R-Squared Adjusted
A modified version of r-squared that provides a more accurate measure of fit when comparing models with different numbers of predictors.
Data Transformation
The process of converting data from one format or structure into another.
Categorical Data
Data that can be divided into groups or categories and is often represented with dummy variables.
Extreme Value
Data points that lie far outside the overall distribution, potentially skewing results.
Influential Point
An observation that significantly affects the slope of a regression line.
Forecasting
The process of making predictions about future outcomes based on historical data.
Endogeneity
A situation in a statistical model where an explanatory variable is correlated with the error term.
Sampling Error
The error caused by observing a sample instead of the whole population.
Latent Variable
An unobservable variable that is inferred from observable variables.
Causal Inference
The process of determining whether a relationship between two variables is causal.
Propensity Score Matching
A statistical matching technique that attempts to estimate the effect of a treatment by accounting for covariates that predict receiving the treatment.
Cross-Validation
A statistical method for estimating the skill of a model using different subsets of the data.
Holdout Method
A method for validating a predictive model by partitioning data into training and test sets.
Residual Analysis
An examination of the residuals from a regression model to check for any violations of assumptions.
Statistical Significance
A determination that a relationship observed in data is not likely to be due to chance.
Statistical Power
The probability that a statistical test will correctly reject a false null hypothesis.
Bayesian Statistics
A statistical paradigm that uses Bayes' theorem to update the probability for a hypothesis as more evidence or information becomes available.
Model Robustness
The ability of a model to perform well across different conditions and assumptions.
Type I Error
The incorrect rejection of a true null hypothesis (false positive).
Type II Error
The failure to reject a false null hypothesis (false negative).
Null Hypothesis
A default hypothesis that there is no effect or no difference.
Alternative Hypothesis
The hypothesis that indicates the presence of an effect or a difference.
Power Analysis
A method to determine the sample size required to detect an effect of a given size.
Chi-Squared Test
A statistical test used to determine if there is a significant association between categorical variables.
Time Series Analysis
Statistical techniques used to analyze time-ordered data points.
Data Visualization
The graphical representation of information and data.
Bivariate Analysis
The analysis of two variables to determine the empirical relationship between them.
Multivariate Analysis
The analysis of more than two variables simultaneously.
Factor Analysis
A statistical method used to identify underlying relationships between variables.
Cluster Analysis
A technique used to group a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups.
Regression Coefficients
The estimates that represent the relationship between each independent variable and the dependent variable.
Sensitivity Analysis
The study of how changes in the input of a model can affect its output.
Influence Function
A measure of the effect of a small change in the data on a statistical estimate.
Residual Standard Deviation
The standard deviation of the residuals, indicating the spread of the residuals around zero.