1/124
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
What is a regression?
An equation that represents how a set of factors explains an outcome and how the outcome moves with each factor.
What are the four main objectives of regression analysis?
1. Quantify how one factor causally affects another. 2. Forecast or predict an outcome. 3. Determine the predictors of a factor. 4. Adjust an outcome for various factors.
How can regression analysis help in policy intervention?
It can assess the impact of actions, such as whether legalizing lane filtering reduced motorbike accidents.
What is a dependent variable in regression analysis?
The outcome variable, often denoted as Y.
What is an explanatory variable in regression analysis?
The variable that explains changes in the dependent variable, often denoted as X.
What does the coefficient of the explanatory variable represent?
It indicates how the outcome is estimated to move with a one-unit change in the explanatory variable.
What does the intercept term in a regression model signify?
The expected value of Y when X equals zero.
What is the error term in regression analysis?
It measures how far an individual data point is from the true regression line, representing the difference between the actual and predicted Y values.
What is the purpose of calculating the slope in regression?
To indicate how the dependent variable and explanatory variable move together relative to the variation in the explanatory variable.
What does it mean to adjust an outcome for various factors in regression?
It involves gauging relative performance or value added by factoring out environmental and situational variables.
How does regression analysis help in forecasting outcomes?
It allows for the assessment of likelihoods, such as predicting extreme weather events or expected time on market for a new house.
What is the significance of determining predictors in regression analysis?
It identifies factors that correlate with an outcome, such as education and experience affecting wage.
What is an example of using regression to assess behavior?
Analyzing whether watching TikTok in class negatively impacts performance on the final exam.
What is the difference between theoretical and estimated regression equations?
Theoretical regression represents the true relationship, while estimated regression is based on sample data and may include errors.
What is the role of the subscript 'I' in regression analysis?
It refers to the individual i in the sample of N individuals.
How can regression analysis detect anomalies?
By gauging performance relative to expected outcomes and identifying deviations from the norm.
What factors might determine the selling price of a house in regression analysis?
Characteristics such as age, number of rooms, baths, garages, and size.
What is a practical application of regression analysis in insurance?
To assess the likelihood of extreme weather events affecting premiums.
What is a common use of regression analysis in real estate?
To evaluate an agent's revenue while accounting for market price increases and supply and demand.
What is total variation in regression analysis?
Total variation (total sum of squares, TSS) is the sum of squares deviations from the mean.
What does r squared (r²) represent in regression?
R squared represents the proportion of total variation in the dependent variable Y that is explained by the independent variable(s) X.
What does a high r squared value indicate?
A high r squared value indicates that the data points are close to the regression line, meaning the model explains a large portion of the variation in the dependent variable.
What is adjusted r squared used for?
Adjusted r squared is used to evaluate whether adding a new explanatory variable increases the explanatory power of a model, accounting for the number of predictors.
What does MSE stand for and what does it measure?
MSE stands for Mean Squared Error and measures the estimated variance of the error, or the variance of the residuals.
What is the relationship between MSE and RMSE?
The square root of MSE is called the Root Mean Square Error (RMSE), which provides a measure of the standard error of the estimate.
What does correlation indicate?
Correlation indicates the degree to which two variables move together, either positively or negatively, but does not imply causation.
What is the difference between correlation and causation?
Causation implies that one variable has a direct effect on another, while correlation only indicates that two variables move together without implying a direct effect.
What is a Key X variable in regression analysis?
Key X is the variable that is being analyzed for its causal effect on the dependent variable.
What are control variables in regression analysis?
Control variables are factors suspected of having an association with the dependent variable and are included to isolate the effect of the Key X variable.
What are the conditions necessary for regression models?
The conditions include: average error term equals 0, error terms are independently and identically distributed, normally distributed, homoskedastic, and Key X variables are uncorrelated with the error term.
What does it mean if residuals are small?
Small residuals indicate that the model's predictions are close to the actual outcomes, suggesting good model performance.
What can large positive residuals indicate?
Large positive residuals indicate that the model underpredicted the outcome.
What can large negative residuals indicate?
Large negative residuals indicate that the model overpredicted the outcome.
What is a dummy variable?
A dummy variable is a qualitative variable that takes on values of 1 or 0 to indicate the presence or absence of a categorical effect.
What is a reference category in regression analysis?
A reference category is a category that is excluded from the model to serve as a baseline for comparison.
What is the role of mediating factors in regression analysis?
Mediating factors describe how the Key X variable affects the outcome Y, illustrating the mechanism of the effect.
What are potential issues that can lead to incorrect regression results?
Potential issues include small sample size, randomness in Y, and omitted or confounding factors.
What is the importance of assessing regression model conditions?
Assessing conditions helps ensure the model is reliable and identifies potential problems that may affect the conclusions drawn from the analysis.
What does it mean if Key X is uncorrelated with the error term?
It means that the Key X variable does not have a systematic relationship with the error term, which is crucial for valid regression results.
What is the significance of the average error term being zero?
An average error term of zero is an automatic property of regression models with an intercept, indicating that the model's predictions are unbiased on average.
What does homoskedasticity refer to in regression analysis?
Homoskedasticity refers to the condition where the variability of the errors is constant across all levels of the independent variable(s).
What is the purpose of regression flowcharts?
Regression flowcharts visually represent the relationships between variables, including causal effects and unobservable factors.
What is the difference between cross-sectional and time series data?
Cross-sectional data refers to observations collected at a single point in time, while time series data refers to observations collected over multiple time periods.
What is the interpretation of category coefficients in regression with dummy variables?
They are comparisons to the reference group.
What are interaction effects in regression analysis?
They occur when the effect of a factor differs across categories, acting as a moderating factor.
How can non-linear relationships be modeled in regression?
Using logarithmic transformations, quadratic terms, or spline functions.
What is the purpose of using weighted regression models?
To account for unequal importance of observations, especially in oversampled subpopulations.
What does the term 'holding other factors constant' mean in regression?
It refers to controlling for confounding variables to isolate the effect of the key independent variable.
What is the Average Treatment Effect (ATE)?
It measures how much the outcome would change on average if all subjects received one more unit of the key independent variable.
What is the significance of standard error in regression analysis?
It measures the precision of the coefficient estimates.
What factors can reduce the size of the standard error?
Larger sample size, less unexplained variation in Y, and greater standard deviation in X.
What does a small p-value indicate in hypothesis testing?
It suggests that the results are unlikely due to chance, leading to the rejection of the null hypothesis.
What are the steps in hypothesis testing?
1. Define the hypotheses. 2. Collect data. 3. Calculate the test statistic. 4. Determine the p-value. 5. Make a decision regarding the null hypothesis.
What is the difference between good and bad variation in regression analysis?
Good variation is due to factors not correlated with the dependent variable, while bad variation is due to factors that are correlated and can confound results.
What is a quadratic model in regression?
A model that includes higher powers of the explanatory variable to capture non-linear relationships.
What is a spline in regression modeling?
A piecewise linear model that allows for changes in slope at specified points.
What does it mean to standardize coefficient estimates?
It allows for comparison of the relative contribution of factors by interpreting estimates as one standard deviation increase.
Why should mediating factors not be controlled for in regression?
Controlling for them can distort the estimate of the key independent variable's effect on the outcome.
What is the role of control variables in regression analysis?
They are included to account for confounding factors that may affect the relationship between the key independent variable and the outcome.
What is the relationship between p-values and statistical significance?
A smaller p-value indicates stronger evidence against the null hypothesis, with thresholds for significance at 0.10, 0.05, and 0.01.
What is the purpose of using logarithmic forms of variables in regression?
To interpret changes as percentage changes rather than absolute changes.
What is the impact of sample size on the standard error?
A larger sample size generally leads to a smaller standard error.
What does it mean when a regression model cannot compare linear and logarithmic forms?
It indicates that the interpretations of the coefficients differ significantly between the two forms.
What is the significance of the coefficient estimates in regression output?
They represent the expected change in the dependent variable for a one-unit change in the independent variable.
What is the importance of understanding how dummies are set up in regression?
It ensures that the reference category is correctly identified for accurate interpretation of coefficients.
What does it mean for a regression model to have no intercept?
It indicates that the model does not account for a baseline level of the dependent variable when all independent variables are zero.
What is the standard error for an estimate?
It is typically produced by a statistical program to assess the accuracy of the estimate.
What is the null hypothesis in hypothesis testing?
It is the assumption that there is no effect or difference, typically stated as H0: the parameter equals a specific value.
What does the t statistic measure?
It measures how far the sample estimate is from the hypothesized value, relative to the standard error.
What is the formula for degrees of freedom in hypothesis testing?
Degrees of freedom = N - K - 1, where N is the sample size and K is the number of explanatory variables.
What is the alternative hypothesis?
It is the hypothesis that contradicts the null, indicating that there is an effect or difference (H1: the parameter is not equal to a specific value).
What does a p-value indicate?
It indicates the likelihood that the observed data would occur if the null hypothesis were true.
What does a p-value less than 0.05 signify?
It indicates statistical significance at the 5% level, suggesting strong evidence against the null hypothesis.
What is multicollinearity?
It occurs when an explanatory variable is correlated with another explanatory variable, potentially inflating standard errors.
What is heteroskedasticity?
It refers to the situation where the variance of the error terms is not constant across all values of an explanatory variable.
What is clustering in the context of statistical analysis?
It refers to observations being grouped in a way that introduces systematic differences between groups, affecting the outcome.
What is the significance of a p-value of 0.049?
It is often considered statistically significant, while a p-value of 0.051 may be viewed as weakly significant or insignificant.
What are the two main reasons regression results may be incorrect?
Inaccuracy (moving the coefficient estimate away from the true causal effect) and imprecision (affecting standard errors).
What does it mean if an estimate is statistically insignificant?
It may indicate no effect, a modeling problem, inadequate power, or varying effects in the population.
What is the relationship between sample size and p-values?
With a large enough sample size, p-values are more likely to indicate significance, leading to rejection of the null hypothesis.
What does it imply if nearly all null hypotheses are false?
It suggests that most relationships are statistically significant, as almost everything is related by a non-zero amount.
What is the impact of bias in standard errors?
While bias in standard errors is less serious than bias in coefficient estimates, it still needs to be corrected to avoid misleading conclusions.
What is the main twist regarding p-values?
P-values do not accurately indicate whether an estimate is different from the hypothesized value, as they depend on sample size and randomness.
What does a p-value less than 0.10 indicate?
It indicates statistical significance at the 10% level.
What is the consequence of controlling for a correlated variable in a model?
It may reduce the operative variation in the key variable, potentially leading to higher standard errors and biased estimates.
What does it mean if the t-statistic is far from zero?
It suggests a lower p-value, indicating stronger evidence against the null hypothesis.
What is the implication of a wide confidence interval?
It may indicate imprecision, making an estimate practically insignificant despite being statistically significant.
What is the role of prior knowledge in interpreting p-values?
Prior knowledge can affect the interpretation of p-values, as it may change the perceived certainty of a relationship.
What is the primary objective when estimating causal effects?
To determine how certain we can be that the coefficient estimate represents the causal effect of an X variable on the Y variable.
What are the two general considerations when estimating causal effects?
1) Is there a bias moving the coefficient estimate away from the true causal effect? 2) Is there an alternative explanation for the coefficient estimate?
How is ATE calculated?
ATE = (Average outcome in treatment group) - (Average outcome in control group).
What is the difference between good variation and bad variation in key-X?
Good variation is due to factors not correlated with the dependent variable, while bad variation is due to factors that could be correlated with the dependent variable.
What is reverse causality in regression analysis?
When the dependent variable Y affects the key-explanatory variable X, leading to biased estimates.
What is omitted-factors bias?
When the estimated causal effect picks up the effects of other factors correlated with the treatment but not accounted for in the model.
What is self-selection bias?
When an individual's characteristics influence whether they receive treatment or the extent of treatment they receive.
What causes measurement error in regression analysis?
When an explanatory variable is inaccurately measured, leading to biased estimates, often towards zero.
Why should mediating factors not be used as control variables?
Because they are products of the key-X and controlling for them removes part of the effect the researcher aims to capture.
What is the impact of using an improper reference group in regression analysis?
It can manipulate results based on which reference group is used, affecting the interpretation of treatment effects.
What is spurious correlation?
A false relationship between the key-explanatory variable and the dependent variable due to omitted factors.
What is incidental correlation?
When factors affecting the dependent variable are not fully held constant and are correlated with the key-X variable.