Data Transformation Flashcards

Univariate Data Transformation

  • Focus on transformations involving a single variable.

Natural Logs

  • Natural logs are frequently used in business and economics for data transformation to simplify analysis and obtain quick answers.
  • Natural Logarithm and Exponential Function: Quick Refresh
    • If ab=xa^b = x, then loga(x)=blog_a(x) = b.

Natural Log

  • It is logarithm to the base ee, where ee is an irrational number approximately equal to 2.7182.
  • Notation: ln(x)=loge(x)ln(x) = log_e(x), where xx must be positive.

Exponential Function

  • Exponent of xx is commonly written as exe^x.
  • If y=exy = e^x, then x=ln(y)x = ln(y).

Approximating Natural Log and Exponential Functions

  • Approximations are used to simplify formulas and get quick answers, but they only work if xx is small.
  • For small xx, ln(1+x)xln(1 + x) ≈ x.
  • Exponential approximation: ex1+xe^x ≈ 1 + x.

Investment Growth Example

  • Assume an investment grows at a rate of growth rr, starting with an initial value x0x_0.
  • Value of investment at time tt is denoted as x(t)x(t).
  • At time 0, the investment value is x0x_0.
  • After one time period, the value is x0(1+r)x_0(1 + r).
  • After two time periods, the value is x0(1+r)2x_0(1 + r)^2.
  • After tt time periods, the value is x0(1+r)tx_0(1 + r)^t.
  • If r > 0, there is exponential growth.
  • If r < 0, there is exponential decay.
Numerical Example
  • Investing $100 for 10 years at 3% interest yields $100 * (1 + 0.03)^{10} = $134.39.

Application of Natural Logs to Economics Data

  • Economic and financial data often resemble exponential functions when graphed.
  • Natural logs are helpful in linearizing exponential growth for regression analysis.
Linearizing Exponential Growth
  • Start with the exponential growth formula: x(t)=x0(1+r)tx(t) = x_0(1 + r)^t.
  • Apply the log transformation to both sides: ln(x(t))=ln(x0(1+r)t)ln(x(t)) = ln(x_0(1 + r)^t).
  • Using log rules, rewrite the right side: ln(x(t))=ln(x0)+ln((1+r)t)ln(x(t)) = ln(x_0) + ln((1 + r)^t).
  • Further simplification: ln(x(t))=ln(x0)+tln(1+r)ln(x(t)) = ln(x_0) + t * ln(1 + r).
  • Approximation for small rr: ln(1+r)rln(1 + r) ≈ r.
  • Final linearized equation: ln(x(t))ln(x0)+trln(x(t)) ≈ ln(x_0) + t * r.

Proportionate Changes

  • Proportionate change in xx is defined as Δxx<em>0=x</em>1x<em>0x</em>0\frac{\Delta x}{x<em>0} = \frac{x</em>1 - x<em>0}{x</em>0}.
  • It represents the new value minus the old value, divided by the old value.
  • Example: Price inflation calculation.

Stata Demonstration

  • Opening the real GDP per capita dataset in Stata.
  • Using the describe command to understand the file.
  • Using square brackets to refer to specific observations in Stata (e.g., year[_n]).
Calculating Price Inflation in Stata
  • Inflation in 1930 is calculated as (Price in 1930 - Price in 1929) / Price in 1929.
  • Stata command: generate inflation1 = (price[_n] - price[_n-1]) / price[_n-1].
  • Alternative approach using difference and lag functions: generate inflation2 = D.price / L.price.

Poll Everywhere Questions

  • Question about calculating the yearly inflation based on monthly data.
  • Question about approximating exe^x for small xx.
  • The correct formula for approximating exe^x is 1+x1 + x.

Approximating Proportionate Changes with Logs

  • For small proportionate changes, the change in the log of xx is approximately equal to the proportionate change in xx: Δln(x)Δxx\Delta ln(x) ≈ \frac{\Delta x}{x}.
  • Example: If x<em>0=40x<em>0 = 40 and x</em>1=40.4x</em>1 = 40.4, then ln(40.4)ln(40)0.01ln(40.4) - ln(40) ≈ 0.01, which is close to the proportionate change in xx.
  • Good approximation for series with relatively small changes.

Compounding and the Rule of 72

  • The rule of 72 estimates how long it takes for an investment to double.
  • If you invest $1 at interest rate RR, after NN time periods, you'll have (1+R)N(1 + R)^N.
  • To find when the investment doubles (becomes $2), solve 2=(1+R)N2 = (1 + R)^N for NN.
  • Applying logs: ln(2)=Nln(1+R)ln(2) = N * ln(1 + R).
  • Solving for NN: N=ln(2)ln(1+R)ln(2)RN = \frac{ln(2)}{ln(1 + R)} ≈ \frac{ln(2)}{R}.
  • Using the approximation ln(2)0.69ln(2) ≈ 0.69, leads to the rule of 69.
  • Since 72 is more convenient (divisible by 3, 4, 6, 8, 9), we use it instead, giving the rule of 72.
  • Formula: N72RN ≈ \frac{72}{R}, where RR is in percent.
  • Example: At an 8% interest rate, an investment doubles in approximately 9 years.

Skewness and Log Transformation

  • Data in economics is often skewed to the right due to high outliers.
  • Example: Income data.
  • Log transformation makes the data more symmetrical.

Compound Interest Rates

  • If nominal interest rate is RR and there are NN compounding periods per year, the effective interest rate is given by:
    • Reffective=(1+RN)N1R_{effective} = (1 + \frac{R}{N})^N - 1
  • RR is the Annual Percentage Rate (APR).
  • ReffectiveR_{effective} is the Annual Percentage Yield (APY).
  • For continuous compounding, the formula converges to eRe^R as NN approaches infinity.

Other Transformations

  • Standardized Scores (Z-scores)
    • Z<em>i=x</em>ixˉsZ<em>i = \frac{x</em>i - \bar{x}}{s}, where xˉ\bar{x} is the sample mean and ss is the sample standard deviation.
  • Moving Averages
    • Average of observations in several successive time periods.
    • Simple Moving Average: Average of current and immediate past observations.
      • Three-period moving average: x<em>t+x</em>t1+xt23\frac{x<em>t + x</em>{t-1} + x_{t-2}}{3}.
    • Centered Moving Average: Current observation is in the middle, one before and one after
      • x<em>t1+x</em>t+xt+13\frac{x<em>{t-1} + x</em>t + x_{t+1}}{3}.
    • Reasons for Using Moving Averages
      • Reduces random noise in the data.
      • Smooths business cycle variations and seasonal variations.
  • Seasonal Adjustments
    • Adjusting for seasonal variation.
  • Real vs. Nominal Data
  • Per Capita Data
    • Divide by the size of population.
  • Growth Rates and Percentage Changes
    • One-period percentage change: X<em>tX</em>t1Xt1100\frac{X<em>t - X</em>{t-1}}{X_{t-1}} * 100.
    • Often converted to annualized rates.
  • Percent vs. Percentage Point
    • Difference between them.
    • Basis Point: 1/100 of a percentage point.

Practice Questions

  • If the interest rate is 4%, how many years will it take for the investment to double?
    • Answer: 72 / 4 = 18 years.
  • If x increases from 500 to 520, what is the proportionate change in x?
    • Answer: (520 - 500) / 500 = 0.04.
  • If x increases from 500 to 520, what is the absolute change in the log of x?
    • Answer: ln(520) - ln(500) = 0.0392.
  • Three-period simple moving average at time t = 4
    • (20 + 15 + 10) / 3 = 15
  • Three-period centered moving average at time t = 4
    • (15 + 20 + 25) / 3 = 20
  • The interest rate increases from 5.05 to 5.06. The increase based on percent is
    • One basis point

Stata Demonstration of Moving Averages

  • Calculating simple and centered moving averages in Stata.

Date Formats in stata

  • Demonstration of importing and formatting date variables in Stata.
  • Using the date function to convert string variables to date variables.
  • Formatting dates using the format command.