Extensions of the Two-Variable Linear Regression Model — Detailed Study Notes
Regression Through the Origin (No‐Intercept Models)
Definition
Intercept parameter \beta_1 is constrained to be 0.
Estimated equation: \hat Yi = \hat\beta2 X_i.
Rationale / When used
Theoretical considerations dictate that Y=0 when X=0.
Example: risk–premium version of the Capital Asset Pricing Model (CAPM).
CAPM risk-premium form
E(Ri) - rf = \betai\,[E(Rm) - r_f] (6.1.2)
E(R_i) = expected return on security i
E(R_m) = expected return on market portfolio (e.g., S&P 500)
r_f = risk-free rate (≈ 90-day T-bill)
\beta_i = systematic-risk (volatility) coefficient
\betai > 1 → aggressive; \betai < 1 → defensive.
Empirical market model (allowing non-zero \alpha_i)
Ri-rf = \alphai + \betai (Rm - rf) + u_i (6.1.4)
If CAPM holds, \alpha_i = 0 ⇒ regression through origin.
OLS estimation (origin model)
Population regression function (PRF): Yi = \beta2 Xi + ui.
Sample regression function (SRF): \hat Yi = \hat\beta2 X_i.
Estimator and variance
\hat\beta2 = \dfrac{\sum Xi Yi}{\sum Xi^2} (6.1.5)
\operatorname{Var}(\hat\beta2) = \dfrac{\sigma^2}{\sum Xi^2}.
Practical drawbacks
Residuals no longer guaranteed to sum to 0 → diagnostic issues.
R^2 can be negative! Use row R^2 (non-mean-corrected) ensuring 0\le R^2_{row}\le1.
Use‐cases in Economics
Friedman’s permanent-income hypothesis (permanent C ∝ permanent Y).
Cost theory: variable cost ∝ output.
Monetarist models: inflation ∝ growth of money supply.
Illustration: Excess‐Return Data (Example 6.1)
Data: 240 UK monthly observations (1980–1999) on sector index excess return Yt and market excess return Xt.
Regression through origin
Yt = 1.1555\,Xt, SE(\hat\beta_2)=0.0744 ⇒ t=15.53 (p≈0).
R^2=0.5003, SER = 5.549, DW ≈ 1.97.
Regression with intercept
Yt = -0.4475 + 1.1711 Xt.
Intercept not significant (p≈0.219) ⇒ origin model plausible.
Rescaling & Units of Measurement (Section 6.2)
Suppose original model: Yi = \beta1 + \beta2 Xi + u_i.
Define scaled variables
Yi^* = w1 Yi, Xi^* = w2 Xi (6.2.2–6.2.3)
ui^* = w1 u_i.
Regression in scaled units
Yi^* = \beta1^* + \beta2^* Xi^* + u_i^* (6.2.4)
Relationships between original and scaled estimates (6.2.15–6.2.20)
\hat\beta2^* = \dfrac{w1}{w2} \hat\beta2
\hat\beta1^* = w1 \hat\beta_1
\hat\sigma^{*2} = w_1^2 \hat\sigma^2
\operatorname{Var}(\hat\beta1^*) = w1^2 \operatorname{Var}(\hat\beta_1)
\operatorname{Var}(\hat\beta2^*) = \Bigl(\tfrac{w1}{w2}\Bigr)^2 \operatorname{Var}(\hat\beta2)
R^2 is unit-invariant.
Empirical confirmation (GPDI & GDP, 1990–2005)
Both in billions: \text{GPDI} = -926.09 + 0.2535\,\text{GDP} (6.2.21).
Both in millions: intercept & SE ×1000; slope unchanged (6.2.22).
Y billions, X millions: slope ÷1000 (6.2.23).
Y millions, X billions: slope ×1000 (6.2.24).
R^2=0.9648 throughout.
Regression on Standardized Variables ("Beta Regression")
Standardization
Yi^{std}=\dfrac{Yi-\bar Y}{sY}, Xi^{std}=\dfrac{Xi-\bar X}{sX}.
Means = 0, SDs = 1.
Standardized regression
Yi^{std}=\beta2^* Xi^{std}+ui^{std} (intercept = 0).
Interpretation
\beta_2^* = change (in SDs) of Y for one SD increase in X.
Facilitates comparison when variables measured in different units.
Functional Forms of Two-Variable Models
(Remember: “linear model” means linear in parameters, not necessarily in variables.)
1. Log–Log (Double-Log) Model
Specification: \ln Y = \beta1 + \beta2 \ln X + u.
\beta2 measures elasticity: \beta2 = \dfrac{\partial \ln Y}{\partial \ln X}=\dfrac{\partial Y}{\partial X}\,\dfrac{X}{Y}.
Example (6.5.5): Durable‐goods expenditure
\ln(\text{EXPDUR}) = -7.5417 + 1.6266\,\ln(\text{PCEX}).
Elasticity ≈ 1.63 → 1 % ↑ in PCEX → 1.63 % ↑ in EXPDUR.
R^2=0.9695, both coefficients highly significant.
2. Semi-Log Models
a. Log–Lin (Growth) Model
\ln Yt = \beta1 + \beta2 t + ut.
Interpretation: \beta_2 ≈ constant proportionate growth rate per time period (≈ % change).
Example (6.6.8): Services expenditure
\ln(\text{EXS}_t)=8.3226+0.00705 t ⇒ 0.705 % quarterly ≈ 2.82 % annual growth.
R^2=0.9919.
b. Lin–Log Model
Y = \beta1 + \beta2 \ln X + u.
Interpretation (6.6.12–6.6.13): \beta_2 = \dfrac{\Delta Y}{\Delta \ln X}=\dfrac{\Delta Y}{\Delta X/X} ⇒ absolute change in Y for a given % change in X.
Example (6.6.14): Indian food expenditure
\text{FoodExp}i = -1283.912 + 257.270\,\ln(\text{TotalExp}i).
1 % ↑ in TotalExpenditure → ≈ 2.57 rupee ↑ in food spending.
R^2=0.3769.
3. Reciprocal Model
Y = \beta1 + \beta2 \dfrac{1}{X} + u (6.7.1).
Linear in parameters; nonlinear in variable.
As X \to \infty, second term → 0, so Y \to \beta_1 (asymptote).
Example (6.7.2): Child mortality vs per-capita GNP
\text{CM}i = 81.794 + 27{,}237.17\,(1/\text{PCGNP}i).
As PCGNP ↑, CM → 82 deaths per 1000.
R^2=0.4590, coefficients highly significant.
4. Log-Reciprocal Model
\ln Y = \beta1 - \beta2 \dfrac{1}{X} + u.
Slope \dfrac{\partial Y}{\partial X} = +\beta2 \dfrac{1}{X^2} e^{\beta1-\beta_2/X};
elasticity varies with X and Y (see summary table, slide 31).
Comparative Interpretation of Coefficients (Slide 28)
Example using salary & sales:
Linear: \beta_2=0.0155 ⇒ salary ↑ \$155 per $1 m sales.
Log-Log: \beta_2=0.257 ⇒ 1 % sales ↑ → 0.25 % salary ↑.
Log-Lin: \beta_2=1.5\times10^{-5} ⇒ 1 m sales ↑ → 0.0015 % salary ↑.
Lin-Log: \beta_2=262.9 ⇒ 1 % sales ↑ → \$2.629k salary ↑.
Choosing an Appropriate Functional Form (Slide 32)
Economic theory guidance (e.g., Phillips curve suggests nonlinear trade-off).
Desired slope interpretation (absolute vs relative changes vs elasticities).
Signs & magnitudes should align with a-priori expectations.
Multiple forms may fit; compare via theory, statistical diagnostics, and interpretability, not just R^2.
Do not over-emphasize goodness-of-fit; a higher R^2 does not guarantee correct specification.
When ambiguous, apply Box–Cox transformations to empirically choose power/exponential forms.
Practical & Ethical Considerations
Mis-specifying functional form can bias policy conclusions (e.g., elasticity over-/under-estimation).
Scaling and standardization choices alter numerical values; always report original units for clarity.
When forcing regression through origin, ensure theoretical justification; otherwise risk omitted-variable bias.
For income/poverty studies, reciprocal models may exaggerate asymptotes—interpret with socioeconomic context.
Connections to Previous Material
Builds on OLS assumptions (Gauss–Markov) discussed earlier: unbiasedness, variance formulas, hypothesis tests.
Extends two-variable linear regression by:
Dropping intercept (origin models).
Applying linear transformations (scaling, standardization).
Allowing nonlinear variable transformations while retaining linearity in parameters.
Same inferential machinery (t, F, R^2, DW) applies after transformation, but interpretation changes.
Summary Checklist for Exam
✓ Can you derive \hat\beta_2 for a no-intercept model?
✓ Explain why R^2 may be negative and how Row-R^2 fixes it.
✓ Convert coefficients when data are re-expressed (billions ↔ millions).
✓ Interpret standardized-regression slope.
✓ Compute and interpret elasticity from log-log model.
✓ Distinguish growth (log-lin) vs Engel-type (lin-log) interpretations.
✓ Recognize asymptote property of reciprocal models.
✓ Choose functional form based on theory + diagnostics rather than maximal R^2 .