Endogeneity, Exogeneity, and Instrumental Variables in Monitoring and Evaluation

Determinants of Causality and the Role of Instrumental Variables

In the field of Monitoring and Evaluation (M&E), the ultimate objective is frequently to establish causality by proving that a specific program, denoted as $X$ , is the direct cause of an observed outcome, denoted as $Y$ . While the Gold Standard for achieving this is the Randomised Controlled Trial (RCT), real-world public policy scenarios often present ethical or practical constraints that make RCTs impossible to implement. For example, researchers cannot ethically or practically assign individuals to a state of unemployment for the sake of a study. In such non-randomised settings, Instrumental Variables (IV) analysis serves as an invaluable quasi-experimental technique to estimate causal impact. This method functions by leveraging a unique variable known as the instrument, which provides a natural, random-like push toward the treatment. By doing so, the instrument allows evaluators to isolate the genuine effect of the intervention from other confounding factors.

The Nature and Sources of Endogeneity in Monitoring and Evaluation

Endogeneity is a significant hurdle in evaluation that occurs when an explanatory variable, such as participation in a program or intervention, is correlated with the error term in a statistical model. When endogeneity is present, the estimated relationship between the intervention and the outcome becomes biased, meaning it no longer reflects the true causal effect. In simpler terms, endogeneity implies that there are hidden factors influencing both the participation in a program and the outcomes being measured. This makes it challenging to conclude whether the observed results were actually caused by the program itself. Mathematically, endogeneity is defined in econometric theory as a condition where the covariance between the explanatory variable and the error term is not equal to zero: $Cov(X, \epsilon) \neq 0$ .

There are several primary sources of endogeneity that evaluators must recognize. Omitted Variable Bias occurs when vital factors affecting the outcome are excluded from the analysis. For instance, if an education program aims to boost student performance but fails to measure parental support, the estimated effect of the program may be biased because parental support influences both program participation and academic success. Reverse Causality, or simultaneity, occurs when the outcome influences the explanatory variable instead of the other way around; for example, while a business support program is intended to increase income, higher-income households might be more likely to participate in the program in the first place. Selection Bias arises when participants are not randomly selected into a program. A classic example involves more motivated farmers voluntarily joining agricultural training; their subsequent higher productivity might be a result of their inherent motivation rather than the training. Finally, Measurement Error involves inaccurate variable recording, such as the underreporting of household income in surveys, which leads to biased estimates.

Significance of Endogeneity and Methods for Mitigation

Endogeneity is a threat to the internal validity of an evaluation because it obscures the credible causal relationship between an intervention and its outcomes. The consequences of failing to address endogeneity include the overestimation or underestimation of program impacts, misleading policy decisions based on flawed findings, and the inefficient allocation of resources. To combat these issues, evaluators employ several techniques. Randomized Controlled Trials (RCTs) use random assignment to ensure comparability between groups. Instrumental Variables (IV) utilize an external variable that affects program participation but not the outcome directly. Difference-in-Differences (DiD) compares changes over time between treatment and comparison groups. Propensity Score Matching (PSM) matches participants and non-participants who share similar characteristics. Finally, Fixed Effects Models are used to control for unobserved characteristics that remain constant over time.

An illustrative example of endogeneity can be seen in an NGO evaluating a girls' scholarship program. If the NGO finds that beneficiaries score $20 \%$ higher on exams than non-beneficiaries, but those beneficiaries were selected because they were already high-performing students, the results are influenced by selection bias. Consequently, the evaluator cannot confidently attribute the $20 \%$ improvement solely to the scholarship program.

Understanding Exogeneity in Evaluation Research

Exogeneity represents the opposite of endogeneity and is the desired condition in impact evaluations. It refers to a situation where the explanatory variable, such as treatment assignment, is not correlated with the error term in a statistical model. Mathematically, this is expressed as $Cov(X, \epsilon) = 0$ . This condition implies that the explanatory variable is entirely independent of all unobserved factors affecting the outcome. Exogeneity ensures that the estimates of program effects are unbiased and consistent, allowing for a causal interpretation of the data. In simple terms, exogeneity means that participation in a program is not influenced by hidden factors that also affect the outcome, making any observed changes more confidently attributable to the intervention.

There are two main types of exogeneity relevant to statistical modeling. Strict Exogeneity occurs when the explanatory variable is uncorrelated with the error term in all time periods, an assumption frequently used in panel data models. Weak or Contemporaneous Exogeneity occurs when the explanatory variable is uncorrelated with the current error term but may still be related to past or future errors. While strict exogeneity is more robust, weak exogeneity is sufficient for some specific estimation techniques. In randomized evaluations like RCTs, random assignment creates exogeneity by ensuring that the treatment and control groups are statistically similar across both observed and unobserved characteristics. Therefore, any difference in outcomes, such as those between Group A (receiving a scholarship) and Group B (not receiving one), can be attributed to the program.

The Mechanics of Instrumental Variables (IV) Analysis

When endogeneity exists, OLS (Ordinary Least Squares) estimates become biased, necessitating the use of Instrumental Variables. The goal of an IV is to isolate the part of the treatment variable ( $X$ ) that is free from endogeneity by using a third variable ( $Z$ ). To be considered a valid instrument, $Z$ must satisfy two essential conditions. The first is Relevance, meaning the instrument must be correlated with the endogenous variable. The second is Exogeneity, also known as the Exclusion Restriction, which stipulates that the instrument must affect the outcome ONLY through the treatment variable and not directly. For example, if evaluating the impact of education on income, OLS might be biased because factors like motivation or family background influence both. An instrument like "distance to school" might be used: it is relevant because closer proximity increases the likelihood of attending school, and it satisfies the exclusion restriction because distance itself does not directly affect one's future income.

Conceptually, IV estimation is performed in two stages, often referred to as 2SLS (Two-Stage Least Squares). In the First Stage, researchers predict program participation using the instrument, extracting the portion of participation explained by the instrument alone. In the Second Stage, the predicted values from the first stage are used to estimate the impact on the outcome. Because these predicted values are based only on the instrument, they are no longer contaminated by unobserved factors like motivation. In software like STATA, a typical IV regression command would be ivregress 2sls income (education = distance_to_school), where income is the outcome, education is the endogenous variable, and distance_to_school is the instrument.

The Error Term and Its Role in Regression Models

In the context of regression analysis and impact evaluation, the error term, also known as the disturbance term or residual ( $\epsilon$ ), represents the collection of all factors that affect the outcome variable but are not explicitly included in the model. A simple regression model is represented as $Y = \beta_0 + \beta_1 X + \epsilon$ . The error term captures the specific difference between the actual observed outcome and the outcome predicted by the statistical model. It serves as a measure of unexplained variation and is critical for determining the accuracy of estimates; generally, a smaller error term indicates a model that explains more of the variation in outcomes.

There are four main components contained within the error term. First are Omitted Variables, which are factors like motivation, ability, or family background that affect outcomes but stay unmeasured. Second are Measurement Errors, which stem from mistakes in data collection or recording. Third are Random Events, such as unexpected weather shocks that might affect crop yields in an agricultural study. Fourth are Model Imperfections, representing the reality that no model can fully capture the complexity of the real world. In M&E, variables in the error term often include beneficiary motivation, leadership quality, household characteristics, political influences, environmental shocks, and cultural factors. Because evaluators cannot measure every single influence, the error term is unavoidable, and the design of the study must aim to ensure that program participation ( $X$ ) remains uncorrelated with this "everything else" factor to avoid bias.