Interpreting Bivariate Regression and Statistical Significance

The Concept and Interpretation of $R^2$ (R-Squared)

Definition and Function: $R^2$ is a statistical measure that tells the researcher how much of the variation in the dependent variable ( $y$ ) is explained by the regression model (the independent variables).
Explaining Variation: * If a model has an $R^2$ of $0.25$ , it means the equation explains about a quarter ( $25\%$ )) of the variation in the outcome of interest. * If the $R^2$ is $0.5$ , then half of the variation in the dependent variable is explained by the independent variable. This also implies that the other half of the variation ( $50\%$ )) remains unexplained.
Contextualizing Unexplained Variation: * It is rare for one single thing to explain and entire outcome. Many factors usually affect the phenomena being studied. * Unexplained variation suggests there are other factors that could be incorporated into the study to better explain the outcome. Examples of such "control variables" include gender, educational background, or income levels.
Measurement Target: $R^2$ specifically measures the effectiveness of a particular regression line in explaining the outcome. It represents how well the equation fits the data.

Case Study: Foreign Aid and Public Opinion (Goldsmith et al.)

Source Material: Ben Goldsmith and colleagues published an article titled "Doing Well by Doing Good: The Impact of Foreign Aid on Public Opinion" in 2010.
Research Question: Does foreign aid extended by one country improve that country's image among the populations of the recipient countries?
Methodology: * The researchers used a multinational survey (data from many countries). * They focused on the USAID program targeted to address HIV and AIDS, specifically the PEPFAR program.
Analysis Approach ("First Cut"): * The authors used simple scatter plots using the outcome variable. * Time Periods Observed: * Entire period: $2002 \text{--} 2010$ . * The Bush administration: $2007 \text{--} 2008$ . * The early Obama administration: $2009 \text{--} 2010$ . * They identified an OLS (Ordinary Least Squares) regression line, which is the "best fit" line that minimizes the squared distances between each data point (the dots) and the line.
Results and Interpretation: * Slope Trend: The regression lines have a positive slope, meaning that as targeted aid programs increase, positive perceptions of the U.S. also increase. * Numerical Coefficients: For the period $2007 \text{--} 2010$ , the slope coefficient was reported as $0.26$ . * Meaning of the Slope ( $b$ ): A one-unit increase in the independent variable (HIV/AIDS programs) leads to a $0.26$ unit increase in the dependent variable (public perception percentage). * Substantive Significance: While statistically significant, a $0.26$ increase is essentially a quarter of a percentage point increase in positive perception. This is a real effect, but not necessarily a "dramatic" one (it is not a $10$ percentage point jump). * Level of Confidence: The results were cited as "highly significant at the $99\%$ level." This means the probability ( $p$ ) that these results were found if the null hypothesis was true is less than $0.01$ (p < 0.01).

Case Study: HIV/AIDS Treatment and Aid Lag (Margo Day's IA)

Research Focus: Investigating whether aid given in $2003$ affected the number of people being treated for HIV/AIDS in the recipient country in $2004$ .
Strategy of Lagging Data: Data is staggered (lagged) by one year to guarantee that the cause (aid) preceded the effect (treatment), which is essential for determining causality.
Model 1: Total Aid vs. Treatment: * Relationship: Statistically significant (p < 0.05). * Direction: Positive coefficient. * Magnitude: The coefficient is approximately $0.3$ . * Substantive Interpretation: If the independent variable ( $x$ ) is measured in thousands of dollars, a one-unit increase ( $1,000\,dm^3$ in aid) leads to $0.3$ more people being treated (about a third of a person). Essentially, it costs approximately $3,000\,dm^3$ in aid to treat one person on average. * Variation explained ( $R^2$ ): The total aid explains about $31.5\%$ of the variation in treatment numbers. This is significant for a single variable, though $68.5\%$ remains unexplained due to factors like medical infrastructure or social norms.
Model 2: Bilateral Aid: * Significance: Statistically significant (p < 0.05). * Magnitude: The coefficient is $189.4$ . * Interpretation: A one-unit increase ( $1,000\,dm^3$ in aid) leads to approximately $189$ additional people being treated. This suggests bilateral aid has a much larger substantive effect than the general aid category. * Variation explained ( $R^2$ ): Only about $5.5\%$ of the variation was explained by bilateral aid alone.
Model 3: Multilateral Aid (UN, WHO, IMF): * Significance: The P-value is $0.397$ . * Interpretation: Because p > 0.05, the result is not statistically significant. The coefficient (approximately $79$ ) is meaningless because we cannot reject the null hypothesis.

Case Study: Terrorism and Female Leaders (Holman et al.)

Research Focus: Do female leaders get a "rally around the flag" effect (a public opinion boost) following a terrorist attack?
Subject: Theresa May (UK Prime Minister) during the Manchester attack.
Hypothesis: The authors predicted female leaders would be punished or see a decline in favorability, unlike the traditional rally effect seen for male leaders.
Variables: * Dependent Variable ( $y$ ): Perceived favorability of Theresa May. * Independent Variable ( $x$ ): Being surveyed before the attack vs. being surveyed after the attack.
Numerical Analysis of the Results Table: * Coefficient: $-0.332$ (negative). * Interpretation: Being surveyed after the attack (a one-unit change in the binary variable) leads to a $0.332$ decrease (about a third of a percentage point) in favorability. * Significance: Indicated by stars in the table legend (typically one star for p < 0.05). * Variation explained ( $R^2$ ): The timing of the survey (before/after attack) explains about $14.3\%$ of the variation in favorability.

Practical Math in Regression Models

Hypothetical Model: Unemployment and Approval Ratings: * Variables: Both measured in percentages ( $\%$ )). * Equation: $y = 65 - 2.5x$ * Y-Intercept ( $a$ ): $65$ . This is the internal approval rating if unemployment ( $x$ ) is zero. * Slope/Coefficient ( $b$ ): $-2.5$ . * Significance: $p = 0.03$ . This is statistically significant (p < 0.05), so we reject the null hypothesis. * Calculating Expected Outcomes: * Scenario 1 (5% unemployment): $y = 65 - (2.5 \times 5) = 65 - 12.5 = 52.5$ . Expected approval is $52.5\%$ ). * Scenario 2 (10% unemployment): $y = 65 - (2.5 \times 10) = 65 - 25 = 40$ . Expected approval is $40\%$ ). * Summary of Effect: A $1\%$ increase in the unemployment rate leads to a $2.5\%$ decrease in the leader's approval rating.

Questions & Discussion

Question (Nora): Could you clarify if the hypothesis explains the phenomenon? * Response: Technically, the regression model (specifically the equation/line) explains a certain percentage of the variation in the dependent variable. It is better to focus the statistics on how much the equation explains the outcome rather than the abstract hypothesis.
Comment (Marissa): Is the coefficient the percentage of variation explained? * Response: No, the percentage of variation explained is the $R^2$ . The coefficient (slope) tells you the magnitude of the effect that a one-unit change in $x$ has on $y$ .
Question (Norm): Is a higher $R^2$ better or does it make a study more valid? * Response: Not necessarily. High $R^2$ just means more variation is accounted for. Unethical researchers might "pack" a model with 30 variables just to inflate $R^2$ , but this obscures which variables are actually important. A lower $R^2$ (like $14.3\%$ ) can still be very meaningful if it shows a strong substantive effect from a key variable.

Interpreting Bivariate Regression and Statistical Significance

The Concept and Interpretation of R2R^2R2 (R-Squared)

Case Study: Foreign Aid and Public Opinion (Goldsmith et al.)

Case Study: HIV/AIDS Treatment and Aid Lag (Margo Day's IA)

Case Study: Terrorism and Female Leaders (Holman et al.)

Practical Math in Regression Models

Questions & Discussion

The Concept and Interpretation of $R^2$ (R-Squared)