Computer Applications in Economics Study Guide

GENERAL INTRODUCTION TO COMPUTER APPLICATIONS IN ECONOMICS

Importance of Computer-Based Analysis: There is a dramatic increase in the reliance on computer programs by statisticians and econometricians. Students of economics must learn data analysis through specialized software to conduct data-based analysis adequately.
Course Structure: The course covers seven units:
- Unit One: Introduction to computer-based analysis and major economic software.
- Unit Two: Data management, operation of software, data entry, transformation, and pre-estimation tests.
- Unit Three: Statistical estimation and graphical analysis (concepts, computation, and result analysis).
- Unit Four: Econometric estimation and analysis using EViews and Stata, including interpretation of results.
- Unit Five: Diagnostic tests, including identifying regression model problems (sources, detection, solutions) using EViews and Stata.
- Unit Six: Introduction to SPSS software.
- Unit Seven: Introduction to PCGIVE and LIMDEP programs.
Prerequisites: Familiarity with MS Windows programs, particularly Excel.
Key Objectives:
- Formulate regression models and estimate parameters.
- Compute simple, multiple, and partial correlation coefficients.
- Conduct significance tests and construct confidence intervals ( $95\%$ ).
- Perform diagnostic checking on regression results and handle non-linear models.

UNIT ONE: INTRODUCTION TO DATA AND SOFTWARE

Fundamental Economic Problems:
1. What goods and services to produce and in what amounts?
2. How to produce them?
3. For whom to produce?
Branches of Economic Analysis:
- Microeconomics: Behavior of individual firms, industries, and consumers; resource allocation; income distribution; relative prices.
- Macroeconomics: Large aggregates (national output, employment, general price level, total spending/saving, imports/exports, money supply).
Significance of Computer-Based Analysis:
- Ensures efficient results.
- Facilitates simple and comprehensive analysis in short timeframes.
- Produces correct, readable outcomes.
Major Economic Software Packages:
- EViews: Developed by economists for sophisticated analysis, regression, and forecasting on Windows. Useful for scientific data, financial analysis, and macro forecasting. Handles cross-section and time series.
- Stata: Modern, powerful program for data management, statistics, econometrics, and graphics. Primarily command-driven.
- SPSS (Statistical Package for the Social Sciences): User-friendly interface with menus and dialog boxes. Intuitive for all levels; features a spreadsheet-like Data Editor.
- PcGive: Written by David Hendry and Jurgen Doornik (University of Oxford). Well-suited for multivariate and univariate autoregressive processes.
- LIMDEP: Developed by William H. Greene. Specialized in LI mited DEP endent models (linear/nonlinear regression, qualitative models) for cross-section, time series, and panel data.

THE NATURE, TYPES, AND SOURCES OF DATA

Data Types:
- Qualitative Data: Categorical variables (e.g., sex, education level) labels for classes/categories; not computable by arithmetic.
- Quantitative Data: Numerical measures (means and standard deviations are meaningful).
  - Discrete Data: Countable (e.g., number of defective items).
  - Continuous Data: Measurable on a scale (e.g., height).
Measurement Scales:
- Nominal: Category labels.
- Ordinal: Ordered categories.
- Interval: Arbitrary zero point and arbitrary unit of measurement.
- Ratio: Natural zero point, arbitrary unit of measurement.
Data Structures:
- Time Series Data: Values of a variable at different points in time (daily, weekly, monthly, quarterly, annually).
  - Example: Table 1.1 shows Ethiopia's GDP from 1984 EC ( $20792.00$ ) to 1995 EC ( $54585.90$ ).
- Cross-Sectional Data: Information on variables at a single point in time across different units (e.g., census, consumer surveys).
  - Example: Table 1.2 shows population and death rates across Michigan, Minnesota, etc., for a specific year.
- Pooled Data: Combines time series and cross-sectional characteristics (e.g., GDP for 20 countries over 10 years).
- Panel Data: A special type of pooled data where the same cross-sectional unit (e.g., same household) is surveyed repeatedly over time.
Sources of Data:
- Primary: Collected directly by the analyst for the specific investigation.
- Secondary: Collected by others (e.g., governmental/international agencies).
- Ethiopian Sources: MoFED (Ministry of Finance and Economic Development), CSA (Central Statistical Authority), NBE (National Bank of Ethiopia).
- International Sources: IMF (International Monetary Fund), WB (World Bank).
Data Quality Issues:
- Observational errors in non-experimental social science data.
- Measurement errors (approximations/round-offs).
- Non-response in surveys.
- Aggregation problems (macro data like GNP excludes firm-level specifics due to confidentiality).

UNIT TWO: DATA MANAGEMENT AND PRE-ESTIMATION

Excel Operations:
- Entering Data: Use TAB to move between cells; TAB at row end moves to the next row if the "Move selection after Enter" option is set to "Down."
- Fixed Decimals: Tools -> Options -> Edit -> Fixed Decimal (Positive for right of decimal, Negative for left).
- Bulk Entry: Select cells -> Type data -> CTRL+ENTER.
- Series Filling: Drag the fill handle; drag right/down for increasing, left/up for decreasing.
EViews Management:
- Workfile Creation: Specify frequency (Annual, Quarterly, Monthly, Weekly, Daily) and range (Start/End dates).
- EViews Frequency Notation: Quarterly ( $1994:2$ ); Monthly ( $1984:1$ ); Weekly/Daily ( $mm/dd/yyyy$ ).
- Reserved Names: ABS, C, CON, D, LOG, RESID, SIN, SQR, etc.
- Entry Methods: Keyboard via "Quick -> Empty Group"; Copy/Paste via Clipboard; Spreadsheet Import (Procs -> Import -> Read Text-Lotus-Excel).
Stata Management:
- Windows: Command Window (type instructions), Result Window (output/errors in red), Review Window (history of commands), Variable Window (current dataset variables).
- Workflow Buttons: Log (start/stop session recording), Viewer, Results, Graph, Data Editor (edit), Data Browser (view only).
- Basic Commands:
  - compare varname1 varname2: Accounts for differences/similarities.
  - describe: Summary of data in memory.
  - drop varlist / keep varlist: Eliminate or retain variables.
  - gsort: Ascending (+) or descending (-) order.
  - generate LINV=log(INV): Create new variables.

DATA TRANSFORMATION AND OUTLIER DETECTION

Transformation in EViews: Use GENR button. Example: LY = Log(Y) for log form; DY = Y(-1) for first-order lag.
Transformation in Stata: replace GDP=100*GDP to scale data; sample 50 to draw a $50\%$ random sample.
Outlier Detection:
- Outlier: An extreme value relative to the mean.
- Scatter Plot (Stata): graph IMPO, oneway. Concentrated values vs. extreme points ( $21,550$ vs. mean $260$ ).
- Box Plot (Stata): graph IMPO, box. Dotted figures at the top denote outliers affecting mean and standard deviation.

UNIT THREE: STATISTICAL ESTIMATION AND GRAPHING

Sequence of Knowledge: Data (Crude) -> Information (Relevant to problem) -> Facts (Supported by data) -> Knowledge (Applied to decision) -> Wisdom (Application with limits).
Central Tendency and Variation:
- Mean ( $\bar{X}$ ): Arithmetic average. $\bar{X} = \frac{\sum X_i}{n}$ .
- Standard Deviation ( $s$ ): Reliability measure for the average. High dispersion = low reliability.
Relationship Measures:
- Covariance: Direction of co-movement (Positive = together, Negative = opposite).
- Correlation Coefficient ( $r$ ): Scaled measure between $-1$ and $+1$ .
  - $r = 1$ : Perfect positive.
  - $r = -1$ : Perfect negative.
  - $r = 0$ : Uncorrelated.
Hypothesis Testing Steps:
1. Formulate $H_0$ (Null) and $H_1$ (Alternate).
2. Set Level of Significance ( $\alpha$ ).
3. Select Test Statistic (compare to critical/table value).
4. Decision: Reject $H_0$ if test statistic is significant or P-value < $\alpha$ .
P-Value Interpretation:
- P < 0.01: Very strong evidence against $H_0$ .
- 0.01 < P < 0.05: Moderate evidence.
- 0.05 < P < 0.10: Suggestive evidence.
- P > 0.10: Little to no evidence.

UNIT FOUR: ECONOMETRIC ESTIMATION (OLS)

Econometrics Definition: Amalgam of economic theory, mathematical economics, economic statistics, and mathematical statistics.
The Stochastic Model: Econometrics adds a disturbance term ( $U$ ) to mathematical models to account for random errors.
Ordinary Least Squares (OLS) Assumptions:
1. Zero mean of $U$ .
2. Homoscedasticity (constant variance of $U$ ).
3. No autocorrelation ( $E(U_i U_j) = 0$ ).
4. Explanatory variables are independent of $U$ .
5. Normality of $U \sim N(0, \sigma^2)$ .
6. No perfect multicollinearity.
Regression Metrics:
- Coefficient of Determination ( $R^2$ ): Percentage of total variation in $Y$ explained by $X$ . Range: $0$ to $1$ .
- Adjusted $R^2$ ( $\bar{R}^2$ ): Penalizes for adding irrelevant regressors. $\bar{R}^2 = 1 - (1 - R^2)\frac{n-1}{n-k}$ .
- t-statistic: Tests individual parameter significance ( $\frac{\hat{\beta}}{se(\hat{\beta})}$ ).
- F-statistic: Tests overall significance of the regression model.
Functional Forms:
- Double-Log (Log-Log): $\ln(Y) = \beta_0 + \beta_1 \ln(X) + U$ . Coefficients ( $\beta_1$ ) represent elasticities.
- Semi-Log (Log-Lin): $\ln(Y) = \beta + \beta X + U$ . Measures relative change in $Y$ for absolute change in $X$ .
- Lin-Log: $Y = \alpha_0 + \alpha_1 \ln(X) + U$ . Measures absolute change in $Y$ for relative change in $X$ .
Wald Test: Used in EViews to test coefficient restrictions (e.g., testing constant returns to scale: $C(2) + C(3) = 1$ ).

UNIT FIVE: DIAGNOSTIC TESTS

Autocorrelation:
- Causes: Omitted variables, misspecification of form, or interpolation.
- Detection: Durbin-Watson ( $d$ ) statistic. Range $0$ to $4$ . $d \approx 2$ suggests no autocorrelation; d < d_L suggests positive autocorrelation.
Heteroscedasticity:
- Causes: Income disparities (discretionary income growth), outliers, or specification errors.
- Detection (Stata): hettest (Cook-Weisberg/Breusch-Pagan test) or rvfplot (residual-versus-fitted plot).
Multicollinearity:
- Impact: High $R^2$ with few significant t-ratios; standard errors become large.
- Detection: Pairwise correlations > $0.8$ .
Specification/Normality:
- Ramsey RESET Test (ovtest): Checks for omitted variables.
- Normality (sktest / qnorm): Checks for skewness/kurtosis to ensure residuals follow a normal distribution.

UNIT SIX: INTRODUCTION TO SPSS

Interface Windows:
- Data Editor: For creating ( $.sav$ ) and modifying data.
- Viewer: Displays statistical results and pivot tables.
- Syntax Editor: For pasting/editing command syntax.
Variable View: Variables must begin with a letter, max 8 characters (classic), no blanks, no special characters except @, #, _, $.
Procedures:
- Analyze -> Descriptive Statistics -> Frequencies / Descriptives.
- Analyze -> Compare Means -> Means (groups data by variables like sex).
- Analyze -> Regression -> Linear.
Pivot Tables: Interactively rearrange rows/columns by double-clicking in the Viewer.

UNIT SEVEN: PCGIVE AND LIMDEP

PcGive: Operates through the "front end" program GiveWin. Uses a Calculator tool for transformations. Regression is done by choosing "Start PcGive" under Modules.
LIMDEP: Focuses on "Limited Dependent" models.
- Data files have the extension .lpj.
- Reads Excel $3.0$ or $4.0$ formats.
- Uses a Project pull-down for importing variables and running models.

Econometrics Definition: Amalgam of economic theory, mathematical economics, economic statistics, and mathematical statistics. Econometrics uses statistical methods to analyze economic data and test economic theories.
The Stochastic Model: In econometric modeling, a disturbance term ( $U$ ) is added to mathematical models to account for random errors. This acknowledges that not all variations in the dependent variable can be explained by the independent variables.
Ordinary Least Squares (OLS) Assumptions: To ensure the validity of the estimates from OLS regression, the following assumptions must hold:
1. Zero mean of $U$ : The error term should average out to zero.
2. Homoscedasticity: The variance of the error term must be constant across all levels of the independent variable(s).
3. No autocorrelation: This means that the error terms should not be correlated with each other; specifically, $E(U_i U_j) = 0$ for all $i \neq j$ .
4. Explanatory variables independence: The independent variables must not be correlated with the error term, ensuring that the estimates are unbiased.
5. Normality of $U$ : The error term should be normally distributed, $U \text{ is normally distributed } \text{N}(0, \sigma^2)$ .
6. No perfect multicollinearity: The explanatory variables should not be perfectly correlated with each other.
Regression Metrics: The following metrics are important when interpreting the results of OLS regression:
- Coefficient of Determination ( $R^2$ ): This metric measures the percentage of total variation in the dependent variable $Y$ that is explained by the independent variable(s) $X$ . The value of $R^2$ ranges from $0$ to $1$ , where values closer to $1$ indicate a better fit.
- Adjusted $R^2$ ( $\bar{R}^2$ ): This metric adjusts the $R^2$ value based on the number of predictors in the model. It penalizes for adding irrelevant regressors, calculated as $\bar{R}^2 = 1 - (1 - R^2) \frac{n-1}{n-k}$ , where $n$ is the number of observations and $k$ is the number of predictors.
- t-statistic: This tests the significance of individual parameters in the model. It is calculated as $\frac{\text{Estimate of } \beta}{\text{Standard Error of } \beta}$ . A higher absolute value indicates a significant predictor.
- F-statistic: This tests the overall significance of the regression model. It assesses whether at least one predictor variable has a non-zero coefficient.
Functional Forms: Various functional forms can be used in econometric models, including:
- Double-Log (Log-Log) Model: Given by $\text{ln}(Y) = \beta_0 + \beta_1 \text{ln}(X) + U$ . In this model, coefficients ( $\beta_1$ ) represent elasticities, which measure the percentage change in the dependent variable for a one-percent change in the independent variable.
- Semi-Log (Log-Lin) Model: This is represented as $\text{ln}(Y) = \beta + \beta X + U$ . It measures the relative change in $Y$ for an absolute change in $X$ .
- Lin-Log Model: Formulated as $Y = \beta_0 + \beta_1 \text{ln}(X) + U$ . It measures the absolute change in $Y$ for a relative change in $X$ .
Wald Test: This test is utilized in EViews to assess whether certain coefficients are equal to zero or meet specific restrictions. An example includes testing for constant returns to scale represented mathematically as $C(2) + C(3) = 1$ . The Wald test examines if the restrictions imposed on the model are valid based on the estimates.