Introduction to Applied Econometrics

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/124

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

125 Terms

1
New cards

What is a regression?

An equation that represents how a set of factors explains an outcome and how the outcome moves with each factor.

2
New cards

What are the four main objectives of regression analysis?

1. Quantify how one factor causally affects another. 2. Forecast or predict an outcome. 3. Determine the predictors of a factor. 4. Adjust an outcome for various factors.

3
New cards

How can regression analysis help in policy intervention?

It can assess the impact of actions, such as whether legalizing lane filtering reduced motorbike accidents.

4
New cards

What is a dependent variable in regression analysis?

The outcome variable, often denoted as Y.

5
New cards

What is an explanatory variable in regression analysis?

The variable that explains changes in the dependent variable, often denoted as X.

6
New cards

What does the coefficient of the explanatory variable represent?

It indicates how the outcome is estimated to move with a one-unit change in the explanatory variable.

7
New cards

What does the intercept term in a regression model signify?

The expected value of Y when X equals zero.

8
New cards

What is the error term in regression analysis?

It measures how far an individual data point is from the true regression line, representing the difference between the actual and predicted Y values.

9
New cards

What is the purpose of calculating the slope in regression?

To indicate how the dependent variable and explanatory variable move together relative to the variation in the explanatory variable.

10
New cards

What does it mean to adjust an outcome for various factors in regression?

It involves gauging relative performance or value added by factoring out environmental and situational variables.

11
New cards

How does regression analysis help in forecasting outcomes?

It allows for the assessment of likelihoods, such as predicting extreme weather events or expected time on market for a new house.

12
New cards

What is the significance of determining predictors in regression analysis?

It identifies factors that correlate with an outcome, such as education and experience affecting wage.

13
New cards

What is an example of using regression to assess behavior?

Analyzing whether watching TikTok in class negatively impacts performance on the final exam.

14
New cards

What is the difference between theoretical and estimated regression equations?

Theoretical regression represents the true relationship, while estimated regression is based on sample data and may include errors.

15
New cards

What is the role of the subscript 'I' in regression analysis?

It refers to the individual i in the sample of N individuals.

16
New cards

How can regression analysis detect anomalies?

By gauging performance relative to expected outcomes and identifying deviations from the norm.

17
New cards

What factors might determine the selling price of a house in regression analysis?

Characteristics such as age, number of rooms, baths, garages, and size.

18
New cards

What is a practical application of regression analysis in insurance?

To assess the likelihood of extreme weather events affecting premiums.

19
New cards

What is a common use of regression analysis in real estate?

To evaluate an agent's revenue while accounting for market price increases and supply and demand.

20
New cards

What is total variation in regression analysis?

Total variation (total sum of squares, TSS) is the sum of squares deviations from the mean.

21
New cards

What does r squared (r²) represent in regression?

R squared represents the proportion of total variation in the dependent variable Y that is explained by the independent variable(s) X.

22
New cards

What does a high r squared value indicate?

A high r squared value indicates that the data points are close to the regression line, meaning the model explains a large portion of the variation in the dependent variable.

23
New cards

What is adjusted r squared used for?

Adjusted r squared is used to evaluate whether adding a new explanatory variable increases the explanatory power of a model, accounting for the number of predictors.

24
New cards

What does MSE stand for and what does it measure?

MSE stands for Mean Squared Error and measures the estimated variance of the error, or the variance of the residuals.

25
New cards

What is the relationship between MSE and RMSE?

The square root of MSE is called the Root Mean Square Error (RMSE), which provides a measure of the standard error of the estimate.

26
New cards

What does correlation indicate?

Correlation indicates the degree to which two variables move together, either positively or negatively, but does not imply causation.

27
New cards

What is the difference between correlation and causation?

Causation implies that one variable has a direct effect on another, while correlation only indicates that two variables move together without implying a direct effect.

28
New cards

What is a Key X variable in regression analysis?

Key X is the variable that is being analyzed for its causal effect on the dependent variable.

29
New cards

What are control variables in regression analysis?

Control variables are factors suspected of having an association with the dependent variable and are included to isolate the effect of the Key X variable.

30
New cards

What are the conditions necessary for regression models?

The conditions include: average error term equals 0, error terms are independently and identically distributed, normally distributed, homoskedastic, and Key X variables are uncorrelated with the error term.

31
New cards

What does it mean if residuals are small?

Small residuals indicate that the model's predictions are close to the actual outcomes, suggesting good model performance.

32
New cards

What can large positive residuals indicate?

Large positive residuals indicate that the model underpredicted the outcome.

33
New cards

What can large negative residuals indicate?

Large negative residuals indicate that the model overpredicted the outcome.

34
New cards

What is a dummy variable?

A dummy variable is a qualitative variable that takes on values of 1 or 0 to indicate the presence or absence of a categorical effect.

35
New cards

What is a reference category in regression analysis?

A reference category is a category that is excluded from the model to serve as a baseline for comparison.

36
New cards

What is the role of mediating factors in regression analysis?

Mediating factors describe how the Key X variable affects the outcome Y, illustrating the mechanism of the effect.

37
New cards

What are potential issues that can lead to incorrect regression results?

Potential issues include small sample size, randomness in Y, and omitted or confounding factors.

38
New cards

What is the importance of assessing regression model conditions?

Assessing conditions helps ensure the model is reliable and identifies potential problems that may affect the conclusions drawn from the analysis.

39
New cards

What does it mean if Key X is uncorrelated with the error term?

It means that the Key X variable does not have a systematic relationship with the error term, which is crucial for valid regression results.

40
New cards

What is the significance of the average error term being zero?

An average error term of zero is an automatic property of regression models with an intercept, indicating that the model's predictions are unbiased on average.

41
New cards

What does homoskedasticity refer to in regression analysis?

Homoskedasticity refers to the condition where the variability of the errors is constant across all levels of the independent variable(s).

42
New cards

What is the purpose of regression flowcharts?

Regression flowcharts visually represent the relationships between variables, including causal effects and unobservable factors.

43
New cards

What is the difference between cross-sectional and time series data?

Cross-sectional data refers to observations collected at a single point in time, while time series data refers to observations collected over multiple time periods.

44
New cards

What is the interpretation of category coefficients in regression with dummy variables?

They are comparisons to the reference group.

45
New cards

What are interaction effects in regression analysis?

They occur when the effect of a factor differs across categories, acting as a moderating factor.

46
New cards

How can non-linear relationships be modeled in regression?

Using logarithmic transformations, quadratic terms, or spline functions.

47
New cards

What is the purpose of using weighted regression models?

To account for unequal importance of observations, especially in oversampled subpopulations.

48
New cards

What does the term 'holding other factors constant' mean in regression?

It refers to controlling for confounding variables to isolate the effect of the key independent variable.

49
New cards

What is the Average Treatment Effect (ATE)?

It measures how much the outcome would change on average if all subjects received one more unit of the key independent variable.

50
New cards

What is the significance of standard error in regression analysis?

It measures the precision of the coefficient estimates.

51
New cards

What factors can reduce the size of the standard error?

Larger sample size, less unexplained variation in Y, and greater standard deviation in X.

52
New cards

What does a small p-value indicate in hypothesis testing?

It suggests that the results are unlikely due to chance, leading to the rejection of the null hypothesis.

53
New cards

What are the steps in hypothesis testing?

1. Define the hypotheses. 2. Collect data. 3. Calculate the test statistic. 4. Determine the p-value. 5. Make a decision regarding the null hypothesis.

54
New cards

What is the difference between good and bad variation in regression analysis?

Good variation is due to factors not correlated with the dependent variable, while bad variation is due to factors that are correlated and can confound results.

55
New cards

What is a quadratic model in regression?

A model that includes higher powers of the explanatory variable to capture non-linear relationships.

56
New cards

What is a spline in regression modeling?

A piecewise linear model that allows for changes in slope at specified points.

57
New cards

What does it mean to standardize coefficient estimates?

It allows for comparison of the relative contribution of factors by interpreting estimates as one standard deviation increase.

58
New cards

Why should mediating factors not be controlled for in regression?

Controlling for them can distort the estimate of the key independent variable's effect on the outcome.

59
New cards

What is the role of control variables in regression analysis?

They are included to account for confounding factors that may affect the relationship between the key independent variable and the outcome.

60
New cards

What is the relationship between p-values and statistical significance?

A smaller p-value indicates stronger evidence against the null hypothesis, with thresholds for significance at 0.10, 0.05, and 0.01.

61
New cards

What is the purpose of using logarithmic forms of variables in regression?

To interpret changes as percentage changes rather than absolute changes.

62
New cards

What is the impact of sample size on the standard error?

A larger sample size generally leads to a smaller standard error.

63
New cards

What does it mean when a regression model cannot compare linear and logarithmic forms?

It indicates that the interpretations of the coefficients differ significantly between the two forms.

64
New cards

What is the significance of the coefficient estimates in regression output?

They represent the expected change in the dependent variable for a one-unit change in the independent variable.

65
New cards

What is the importance of understanding how dummies are set up in regression?

It ensures that the reference category is correctly identified for accurate interpretation of coefficients.

66
New cards

What does it mean for a regression model to have no intercept?

It indicates that the model does not account for a baseline level of the dependent variable when all independent variables are zero.

67
New cards

What is the standard error for an estimate?

It is typically produced by a statistical program to assess the accuracy of the estimate.

68
New cards

What is the null hypothesis in hypothesis testing?

It is the assumption that there is no effect or difference, typically stated as H0: the parameter equals a specific value.

69
New cards

What does the t statistic measure?

It measures how far the sample estimate is from the hypothesized value, relative to the standard error.

70
New cards

What is the formula for degrees of freedom in hypothesis testing?

Degrees of freedom = N - K - 1, where N is the sample size and K is the number of explanatory variables.

71
New cards

What is the alternative hypothesis?

It is the hypothesis that contradicts the null, indicating that there is an effect or difference (H1: the parameter is not equal to a specific value).

72
New cards

What does a p-value indicate?

It indicates the likelihood that the observed data would occur if the null hypothesis were true.

73
New cards

What does a p-value less than 0.05 signify?

It indicates statistical significance at the 5% level, suggesting strong evidence against the null hypothesis.

74
New cards

What is multicollinearity?

It occurs when an explanatory variable is correlated with another explanatory variable, potentially inflating standard errors.

75
New cards

What is heteroskedasticity?

It refers to the situation where the variance of the error terms is not constant across all values of an explanatory variable.

76
New cards

What is clustering in the context of statistical analysis?

It refers to observations being grouped in a way that introduces systematic differences between groups, affecting the outcome.

77
New cards

What is the significance of a p-value of 0.049?

It is often considered statistically significant, while a p-value of 0.051 may be viewed as weakly significant or insignificant.

78
New cards

What are the two main reasons regression results may be incorrect?

Inaccuracy (moving the coefficient estimate away from the true causal effect) and imprecision (affecting standard errors).

79
New cards

What does it mean if an estimate is statistically insignificant?

It may indicate no effect, a modeling problem, inadequate power, or varying effects in the population.

80
New cards

What is the relationship between sample size and p-values?

With a large enough sample size, p-values are more likely to indicate significance, leading to rejection of the null hypothesis.

81
New cards

What does it imply if nearly all null hypotheses are false?

It suggests that most relationships are statistically significant, as almost everything is related by a non-zero amount.

82
New cards

What is the impact of bias in standard errors?

While bias in standard errors is less serious than bias in coefficient estimates, it still needs to be corrected to avoid misleading conclusions.

83
New cards

What is the main twist regarding p-values?

P-values do not accurately indicate whether an estimate is different from the hypothesized value, as they depend on sample size and randomness.

84
New cards

What does a p-value less than 0.10 indicate?

It indicates statistical significance at the 10% level.

85
New cards

What is the consequence of controlling for a correlated variable in a model?

It may reduce the operative variation in the key variable, potentially leading to higher standard errors and biased estimates.

86
New cards

What does it mean if the t-statistic is far from zero?

It suggests a lower p-value, indicating stronger evidence against the null hypothesis.

87
New cards

What is the implication of a wide confidence interval?

It may indicate imprecision, making an estimate practically insignificant despite being statistically significant.

88
New cards

What is the role of prior knowledge in interpreting p-values?

Prior knowledge can affect the interpretation of p-values, as it may change the perceived certainty of a relationship.

89
New cards

What is the primary objective when estimating causal effects?

To determine how certain we can be that the coefficient estimate represents the causal effect of an X variable on the Y variable.

90
New cards

What are the two general considerations when estimating causal effects?

1) Is there a bias moving the coefficient estimate away from the true causal effect? 2) Is there an alternative explanation for the coefficient estimate?

91
New cards

How is ATE calculated?

ATE = (Average outcome in treatment group) - (Average outcome in control group).

92
New cards

What is the difference between good variation and bad variation in key-X?

Good variation is due to factors not correlated with the dependent variable, while bad variation is due to factors that could be correlated with the dependent variable.

93
New cards

What is reverse causality in regression analysis?

When the dependent variable Y affects the key-explanatory variable X, leading to biased estimates.

94
New cards

What is omitted-factors bias?

When the estimated causal effect picks up the effects of other factors correlated with the treatment but not accounted for in the model.

95
New cards

What is self-selection bias?

When an individual's characteristics influence whether they receive treatment or the extent of treatment they receive.

96
New cards

What causes measurement error in regression analysis?

When an explanatory variable is inaccurately measured, leading to biased estimates, often towards zero.

97
New cards

Why should mediating factors not be used as control variables?

Because they are products of the key-X and controlling for them removes part of the effect the researcher aims to capture.

98
New cards

What is the impact of using an improper reference group in regression analysis?

It can manipulate results based on which reference group is used, affecting the interpretation of treatment effects.

99
New cards

What is spurious correlation?

A false relationship between the key-explanatory variable and the dependent variable due to omitted factors.

100
New cards

What is incidental correlation?

When factors affecting the dependent variable are not fully held constant and are correlated with the key-X variable.