Eco 231 - Exam 2 Questions

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/47

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 4:05 PM on 4/17/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

48 Terms

1
New cards

Prediction Equation (3 independent variables)

y(hat) = b0 + b1X1 + b2X2 + b3X3

2
New cards

F-test for overall fit in multiple regression

  • checks whether the model explains variation in the dependent variable

  • F=MSE/MSR​

3
New cards

Implication of the F-test for a multiple regression

  • Reject H₀:
    → Statistically significant
    → At least one predictor variable is useful in explaining Y

  • Fail to reject H₀:
    →Not better prediction than the mean of Y alone

4
New cards

Null and alternate hypothesis for regression with 4 independent variables

Equation:

Y=b0​+b1​X1​+b2​X2​+b3​X3​+b4X4+ei

Null hypothesis (H₀):

H0 = β1=β2=β3=B4=0

→ None of the independent variables explain Y

Alternative hypothesis (H₁):

At least one βi≠0

→ At least one independent variable helps explain Y

5
New cards

F-stat formula explaination

  • MSR (Mean Square Regression)
    Measures how much variation in the dependent variable is explained by the model.

  • MSE (Mean Square Error)
    Measures the unexplained variation (the residual/error).

  • If F is large → the model explains significantly more variation than noise → predictors are useful

  • If F is close to 1 → model is not much better than random noise

  • A large F-statistic (with a small p-value) leads you to reject H₀, meaning the model is statistically significant overall.

  • Look at P-value

6
New cards

Polynomial Regression

knowt flashcard image
7
New cards

How do we interpret the coefficient on a quadratic variable in a regression?

  1. Curve Direction

  • b2>0upward (U)

  • b2<0downward (n)

  1. Marginal effect is not constant

  2. Turning point

    1. If b2>0min

    2. If b2<0max

8
New cards

What is the logarithm transformation of the x and y variables?

The % change in y for every 1% change in x

<p><strong>The % change in y for every 1% change in x</strong></p>
9
New cards

What does the coefficient on the x variable tell us after a log-log transformation of both the x and y variables?

the elasticity

10
New cards

What is an indicator/dummy variable

  • variable that replaces a qualitative value

  • difference in the predicted outcome between the group coded as 1 and the group coded as 0

11
New cards

When there are m different groups in the data for an independent variable, how many dummy variables are needed to indicate the m groups?

one less than m

12
New cards

What does the coefficient on a dummy variable tell us in a regression model?

  • The average difference in Y between the group with D=1 and the group with D=0

    • If β1>0 → the “1” group has a higher outcome than the baseline

    • If β1<0 → the “1” group has a lower outcome

    • If β1=0 → no difference between groups

13
New cards

Can you show the difference using graphs?

  • Lower line = baseline group d=0

  • Upper line = dummy variable d=1

  • Vertical distance between lines =b1

14
New cards

What is an interaction term in a regression?

  • The effect of one variable on the outcome depends on the level of another variable.

    • The impact of X, changes when Z changes

15
New cards

What does the coefficient on the interaction of a dummy variable and an independent variable (x) tell us?

Y=β0​+β1​X+β2​Z+β3​(X⋅Z)+ϵ

  • X⋅Z is the interaction term

  • β3​ tells you how the effect of X changes with Z

16
New cards

Can you show the difference using graphs?

  • When Z=0:

    • Y=β0+β1X

  • When Z=1:

    • Y=(β0+β2)+(β1+β3)X

  • The intercept is different (shift up/down)

  • The slope is also different (lines are not parallel)

    • The difference in slopes is caused by the interaction term β3

17
New cards

How is adjusted R-square different from R-square?

R-squared

  • Measures the percentage of variation in Y explained by the model

  • R2 always increases (or stays the same) when you add more variables

Adjusted R-Squared

  • Used when you have to compare 2 regressions

  • Adjusts R2 for the # of independent variables

<p>R-squared</p><ul><li><p>Measures the percentage of variation in <span>Y</span> explained by the model</p></li><li><p><span>R2</span> always increases (or stays the same) when you add more variables</p></li></ul><p>Adjusted R-Squared</p><ul><li><p>Used when you have to compare 2 regressions</p></li><li><p>Adjusts <span>R2 </span>for the # of independent variables</p></li></ul><p></p>
18
New cards

How to use adjusted R-square to compare reduced vs. full model in regression analysis?

  • SSEr and SSEf are the error sum of squares for the reduced and full model

  • K and L are the number of independent variables in the full and reduced models

<ul><li><p><span style="line-height: 0px;">SSE</span><span style="background-color: inherit; line-height: 0px;">r</span><span style="line-height: 0px;"> and SSE</span><span style="background-color: inherit; line-height: 0px;">f</span><span style="line-height: 0px;"> are the error sum of squares for the reduced and full model</span></p></li></ul><ul><li><p><span style="line-height: 0px;">K and L are the number of independent variables in the full and reduced models </span><span>​</span></p></li></ul><p></p>
19
New cards

What other criteria could you use for model selection?

  • stepwise regression

    • begin with a simple regression then add more independent variables as controls

  • backward elimination method

    • begin with full model (F) then eliminate insignificant independent variables in reduced model (R)

20
New cards

How to assess linear assumptions?

Points in a straight trend are linear

21
New cards

How to correct for violations of linearity assumption?

  • Add a lagged value of the dependent variable as an explanatory variable

22
New cards

What are the ZINE assumptions for multiple regression models?

  • Zero as the expected value for ei

  • Independence of errors​

    • Error values are statistically independent​

  • Normality of errors​

    • Error values are normally distributed for any given set of X values​

  • Equal Variance (also called Homoscedasticity)​

    • The probability distribution of the errors has constant variance​

23
New cards

How to assess the assumption of constant variance?

points in straight trend are constant

24
New cards

Heteroscedasticity

  • variance of the error terms is not constant across observations

  • fan shape” in residual plots

  • makes standard errors wrong

    • messes up hypothesis tests and confidence intervals

25
New cards

How to correct heteroscedasticity?

  • robust standard errors

  • Transform dependent or independent variables

26
New cards

How to assess the assumption of normality for disturbance distribution?

  • normal probability plot should be a straight line

  • 68% of the standardized residuals should be between -1 and 1

  • 95% should be between -2 and 2

  • 99% should be between -3 and 3​

27
New cards

How to correct for violations of normality?

  • If the sample size is greater than 30, normality assumption is not a concern​

  • For small sample, use Box-Cox transformation toy: yp where 0<p<1​

28
New cards

How to assess the assumption that the disturbances are independent?

  • Autocorrelation of residuals in time-series data​

Et = PEt-1 + Ut where 0 < p < 1

  • Test for first-order autocorrelation​

    • residual analysis

    • Durbin-Watson test ​ ​

29
New cards

Durbin Watson Test

  • Residual analysis for time series data​

  • The d statistic for Durbin-Watson test​

    • If residuals are uncorrelated, d = 2​

    • If residuals are positively correlated, d <2​

    • If residuals are negatively correlated, d> 2

30
New cards

How to correct violations of independence or autoregressive errors?

Add a lagged value of the dependent variable as an explanatory variable

31
New cards

What is correlation matrix?

are the pairwise correlations greater than 0.5?

32
New cards

How to deal with collinearity problem in regression?

  • The presence of a high degree of multicolinearityamong explanatory variables can cause:​

    • insignificant coefficients on independent variables

    • unstable regression coefficients ​

  • Detecting multicollinearity​

    • Correlation matrix: are pairwise correlations greater than 0.5?​

    • Large F statistic for regression but small t statistics (low p value) for estimated coefficient​

  • Correction for multicollinearity​

    • Remove those variables that are highly correlated with others​

33
New cards

How to detect outliers?

use standardized deviation (ef), focusing on those with a value greater than 2​

34
New cards

How to correct outlier problem?

  • Transformations:

    • log(X) or log(Y)

  • Robust regression

35
New cards

What are the four components in a time series dataset?

  • Trend component (T)

  • Seasonal component (S)

  • Cyclical component (C)

  • Irregular component (R)

36
New cards

How do you identify them?

Trend

  • overall pattern

Seasonal

  • part of the variation that fluctuates in a stable way over time

Cyclical

  • regular cycles in the data with periods longer than one year

Irregular

  • part of the data not explained by the model

37
New cards

Smoothing Method

  • smooth away” the rapid fluctuations and capture the underlying behavior

  • recent behavior is a good indicator of behavior in the near future

  • Smoothing out fluctuations is generally accomplished by averaging adjacent values in the series

38
New cards

Simple Moving Average (SMA(4))

Yt = (Yt + Yt-1 + Yt-2 +Yt-3) / 4

Yt-1(hat) = Yt(hat)

39
New cards

Exponential Smoothing

EMA(0.7) = Yt(hat) = 0.7Yt + 0.3Yt-1(hat)

Y1(hat) = Y1

40
New cards

Autoregressive model

Yt(hat) = b0 + b1Yt-1 + b2Yt-2 + b3Yt-3

41
New cards

 How is autoregressive model different from smoothing method?

Autoregressive

  • uses past values of the series itself

  • depends on lagged values of Y

  • statistical model to analyze structure

Smoothing

  • uses past observations and forecasts

  • depends on weighted averages of past values

  • forecasting technique

42
New cards

Estimation equation for the autoregressive model with 3 lags

y(hat) = b0 + b1Ylag1 + b2Ylag2 + b3Ylag3

43
New cards

What is a multiple regression-based model for time series data with seasonal factors?

Yt​=β0​ + β1​t + β2​D1 ​+ β3​D2​ + ⋯ + βk​Dk​ + ϵt​

44
New cards

How do you represent the 4 quarters in a regression model?

Step 1: create dummy variables (3)

  • D1​=1 if Q1, else 0

  • D2=1D_2 = 1D2​=1 if Q2, else 0

  • D3=1D_3 = 1D3​=1 if Q3, else 0

Step 2: Write regression model

Yt​=β0​+β1​D1​+β2​D2​+β3​D3​+ϵt​

Step 3: Interpret the coefficents

  • β0​ → mean value in Q4 (baseline)

  • β1\beta_1β1​ → difference between Q1 and Q4

  • β2\beta_2β2​ → difference between Q2 and Q4

  • β3\beta_3β3​ → difference between Q3 and Q4

Step 4: What the model implies

  • Q4: Y=β0Y = \beta_0Y=β0​

  • Q1: Y=β0+β1Y = \beta_0 + \beta_1Y=β0​+β1​

  • Q2: Y=β0+β2Y = \beta_0 + \beta_2Y=β0​+β2​

  • Q3: Y=β0+β3Y = \beta_0 + \beta_3Y=β0​+β3​

45
New cards

MSE

  • mean squared error

  • how well a model’s predictions match the actual data

  • Yt​: actual value

  • Y(hat)t​: predicted (forecasted) value

  • n: number of observations

  • (Yt−Y(hat)t): error (residual)

<ul><li><p>mean squared error</p></li><li><p>how well a model’s predictions match the actual data</p></li><li><p>Yt​: actual value</p></li><li><p>Y(hat)t​: predicted (forecasted) value</p></li><li><p>n: number of observations</p></li><li><p>(Yt−Y(hat)t): error (residual)</p></li></ul><p></p>
46
New cards

MAD

  • mean absolute difference

  • measure of the average size of forecast errors, without squaring them

  • Yt​: actual value

  • Y(hat)t: predicted (forecasted) value

  • n: number of observations

  • ∣Yt−Y(hat)tl: absolute error (ignores sign)

<ul><li><p>mean absolute difference</p></li><li><p>measure of the average size of forecast errors, without squaring them</p></li><li><p>Yt​: actual value</p></li><li><p>Y(hat)t: predicted (forecasted) value</p></li><li><p>n: number of observations</p></li><li><p>∣Yt−Y(hat)tl: absolute error (ignores sign)</p></li></ul><p></p>
47
New cards

MAPE

  • mean average percentage error

  • measures forecast accuracy as a percentage

    • makes it easy to interpret across different scales

  • Yt: actual value

  • Y(hat)t​: forecasted value

  • n: number of observations

  • The fraction = percentage error for each observation

<ul><li><p>mean average percentage error</p></li><li><p>measures forecast accuracy as a <strong>percentage</strong></p><ul><li><p>makes it easy to interpret across different scales</p></li></ul></li><li><p>Yt: actual value</p></li><li><p>Y(hat)t​: forecasted value</p></li><li><p>n: number of observations</p></li><li><p>The fraction = <strong>percentage error for each observation</strong></p></li></ul><p></p>
48
New cards

How do you choose the best forecasting model?

  • MAE → average absolute mistake

  • MSE → penalizes large errors more

  • MAPE → percentage error

The lower, the better