Extensions of the Two-Variable Linear Regression Model — Detailed Study Notes

Regression Through the Origin (No‐Intercept Models)

Definition
- Intercept parameter \beta_1 is constrained to be 0.
- Estimated equation: \hat Yi = \hat\beta2 X_i.
Rationale / When used
- Theoretical considerations dictate that Y=0 when X=0.
- Example: risk–premium version of the Capital Asset Pricing Model (CAPM).
CAPM risk-premium form
- E(Ri) - rf = \betai\,[E(Rm) - r_f] (6.1.2)
- E(R_i) = expected return on security i
- E(R_m) = expected return on market portfolio (e.g., S&P 500)
- r_f = risk-free rate (≈ 90-day T-bill)
- \beta_i = systematic-risk (volatility) coefficient
  - \betai > 1 → aggressive; \betai < 1 → defensive.
Empirical market model (allowing non-zero \alpha_i)
- Ri-rf = \alphai + \betai (Rm - rf) + u_i (6.1.4)
- If CAPM holds, \alpha_i = 0 ⇒ regression through origin.
OLS estimation (origin model)
- Population regression function (PRF): Yi = \beta2 Xi + ui.
- Sample regression function (SRF): \hat Yi = \hat\beta2 X_i.
- Estimator and variance
- \hat\beta2 = \dfrac{\sum Xi Yi}{\sum Xi^2} (6.1.5)
- \operatorname{Var}(\hat\beta2) = \dfrac{\sigma^2}{\sum Xi^2}.
Practical drawbacks
1. Residuals no longer guaranteed to sum to 0 → diagnostic issues.
2. R^2 can be negative! Use row R^2 (non-mean-corrected) ensuring 0\le R^2_{row}\le1.
Use‐cases in Economics
- Friedman’s permanent-income hypothesis (permanent C ∝ permanent Y).
- Cost theory: variable cost ∝ output.
- Monetarist models: inflation ∝ growth of money supply.

Illustration: Excess‐Return Data (Example 6.1)

Data: 240 UK monthly observations (1980–1999) on sector index excess return Yt and market excess return Xt.
Regression through origin
- Yt = 1.1555\,Xt, SE(\hat\beta_2)=0.0744 ⇒ t=15.53 (p≈0).
- R^2=0.5003, SER = 5.549, DW ≈ 1.97.
Regression with intercept
- Yt = -0.4475 + 1.1711 Xt.
- Intercept not significant (p≈0.219) ⇒ origin model plausible.

Rescaling & Units of Measurement (Section 6.2)

Suppose original model: Yi = \beta1 + \beta2 Xi + u_i.
Define scaled variables
- Yi^* = w1 Yi, Xi^* = w2 Xi (6.2.2–6.2.3)
- ui^* = w1 u_i.
Regression in scaled units
- Yi^* = \beta1^* + \beta2^* Xi^* + u_i^* (6.2.4)
Relationships between original and scaled estimates (6.2.15–6.2.20)
- \hat\beta2^* = \dfrac{w1}{w2} \hat\beta2
- \hat\beta1^* = w1 \hat\beta_1
- \hat\sigma^{*2} = w_1^2 \hat\sigma^2
- \operatorname{Var}(\hat\beta1^*) = w1^2 \operatorname{Var}(\hat\beta_1)
- \operatorname{Var}(\hat\beta2^*) = \Bigl(\tfrac{w1}{w2}\Bigr)^2 \operatorname{Var}(\hat\beta2)
- R^2 is unit-invariant.
Empirical confirmation (GPDI & GDP, 1990–2005)
1. Both in billions: \text{GPDI} = -926.09 + 0.2535\,\text{GDP} (6.2.21).
2. Both in millions: intercept & SE ×1000; slope unchanged (6.2.22).
3. Y billions, X millions: slope ÷1000 (6.2.23).
4. Y millions, X billions: slope ×1000 (6.2.24).
5. R^2=0.9648 throughout.

Regression on Standardized Variables ("Beta Regression")

Standardization
- Yi^{std}=\dfrac{Yi-\bar Y}{sY}, Xi^{std}=\dfrac{Xi-\bar X}{sX}.
- Means = 0, SDs = 1.
Standardized regression
- Yi^{std}=\beta2^* Xi^{std}+ui^{std} (intercept = 0).
Interpretation
- \beta_2^* = change (in SDs) of Y for one SD increase in X.
- Facilitates comparison when variables measured in different units.

Functional Forms of Two-Variable Models

(Remember: “linear model” means linear in parameters, not necessarily in variables.)

1. Log–Log (Double-Log) Model

Specification: \ln Y = \beta1 + \beta2 \ln X + u.
\beta2 measures elasticity: \beta2 = \dfrac{\partial \ln Y}{\partial \ln X}=\dfrac{\partial Y}{\partial X}\,\dfrac{X}{Y}.
Example (6.5.5): Durable‐goods expenditure
- \ln(\text{EXPDUR}) = -7.5417 + 1.6266\,\ln(\text{PCEX}).
- Elasticity ≈ 1.63 → 1 % ↑ in PCEX → 1.63 % ↑ in EXPDUR.
- R^2=0.9695, both coefficients highly significant.

2. Semi-Log Models

a. Log–Lin (Growth) Model

\ln Yt = \beta1 + \beta2 t + ut.
Interpretation: \beta_2 ≈ constant proportionate growth rate per time period (≈ % change).
Example (6.6.8): Services expenditure
- \ln(\text{EXS}_t)=8.3226+0.00705 t ⇒ 0.705 % quarterly ≈ 2.82 % annual growth.
- R^2=0.9919.

b. Lin–Log Model

Y = \beta1 + \beta2 \ln X + u.
Interpretation (6.6.12–6.6.13): \beta_2 = \dfrac{\Delta Y}{\Delta \ln X}=\dfrac{\Delta Y}{\Delta X/X} ⇒ absolute change in Y for a given % change in X.
Example (6.6.14): Indian food expenditure
- \text{FoodExp}i = -1283.912 + 257.270\,\ln(\text{TotalExp}i).
- 1 % ↑ in TotalExpenditure → ≈ 2.57 rupee ↑ in food spending.
- R^2=0.3769.

3. Reciprocal Model

Y = \beta1 + \beta2 \dfrac{1}{X} + u (6.7.1).
Linear in parameters; nonlinear in variable.
As X \to \infty, second term → 0, so Y \to \beta_1 (asymptote).
Example (6.7.2): Child mortality vs per-capita GNP
- \text{CM}i = 81.794 + 27{,}237.17\,(1/\text{PCGNP}i).
- As PCGNP ↑, CM → 82 deaths per 1000.
- R^2=0.4590, coefficients highly significant.

4. Log-Reciprocal Model

\ln Y = \beta1 - \beta2 \dfrac{1}{X} + u.
Slope \dfrac{\partial Y}{\partial X} = +\beta2 \dfrac{1}{X^2} e^{\beta1-\beta_2/X};
elasticity varies with X and Y (see summary table, slide 31).

Comparative Interpretation of Coefficients (Slide 28)

Example using salary & sales:
- Linear: \beta_2=0.0155 ⇒ salary ↑ \$155 per $1 m sales.
- Log-Log: \beta_2=0.257 ⇒ 1 % sales ↑ → 0.25 % salary ↑.
- Log-Lin: \beta_2=1.5\times10^{-5} ⇒ 1 m sales ↑ → 0.0015 % salary ↑.
- Lin-Log: \beta_2=262.9 ⇒ 1 % sales ↑ → \$2.629k salary ↑.

Choosing an Appropriate Functional Form (Slide 32)

Economic theory guidance (e.g., Phillips curve suggests nonlinear trade-off).
Desired slope interpretation (absolute vs relative changes vs elasticities).
Signs & magnitudes should align with a-priori expectations.
Multiple forms may fit; compare via theory, statistical diagnostics, and interpretability, not just R^2.
Do not over-emphasize goodness-of-fit; a higher R^2 does not guarantee correct specification.
When ambiguous, apply Box–Cox transformations to empirically choose power/exponential forms.

Practical & Ethical Considerations

Mis-specifying functional form can bias policy conclusions (e.g., elasticity over-/under-estimation).
Scaling and standardization choices alter numerical values; always report original units for clarity.
When forcing regression through origin, ensure theoretical justification; otherwise risk omitted-variable bias.
For income/poverty studies, reciprocal models may exaggerate asymptotes—interpret with socioeconomic context.

Connections to Previous Material

Builds on OLS assumptions (Gauss–Markov) discussed earlier: unbiasedness, variance formulas, hypothesis tests.
Extends two-variable linear regression by:
- Dropping intercept (origin models).
- Applying linear transformations (scaling, standardization).
- Allowing nonlinear variable transformations while retaining linearity in parameters.
Same inferential machinery (t, F, R^2, DW) applies after transformation, but interpretation changes.

Summary Checklist for Exam

✓ Can you derive \hat\beta_2 for a no-intercept model?
✓ Explain why R^2 may be negative and how Row-R^2 fixes it.
✓ Convert coefficients when data are re-expressed (billions ↔ millions).
✓ Interpret standardized-regression slope.
✓ Compute and interpret elasticity from log-log model.
✓ Distinguish growth (log-lin) vs Engel-type (lin-log) interpretations.
✓ Recognize asymptote property of reciprocal models.
✓ Choose functional form based on theory + diagnostics rather than maximal R^2 .