Regression Part 2

Course: BNAN 562
Instructor: Tamar Kugler, Associate Professor of Management and Organizations
Institution: University of Arizona Eller MBA

Definition: Standard error of the estimate (denoted as $s_{y.x}$) is described as the "standard deviation of the errors". It measures the dispersion of data points above and below the regression line.
Formula: s{y.x} = rac{SSE}{n-2} = rac{ ext{sqrt} igg( extstyle{ extstyle igg( extstyle extstyle igg( extstyle(y{i} - ext{ extit{y-hat}}_{i})^{2}igg)} extstyle igg) extstyleigg)}igg)}{n-2} where,
- $SSE$ = Sum of Squared Errors
- $n$ = number of observations
Interpretation:
- Represents the degree of scatter around the regression line.
- If $s_{y.x} = 0$, this indicates perfect prediction where $R^2 = 1.0$.
- A larger standard error indicates a worse fit and less reliable predictions from the regression.
Example Equation: $ext{ extit{y-hat}} = 1676 - 265x$
- This illustrates how the predicted value is calculated using the regression equation.
Testing Significance: Conduct a t-test to determine if the predictor variable (denoted as $x$) significantly predicts the outcome variable (denoted as $y$).

Main Questions:
1. What is the regression equation?
2. Is the slope significantly different from zero?

Based on output coefficients, the regression equation is formulated as follows:
$ext{ extit{y-hat}} = 1,676 - 265 imes extit{x}$
Variable Interpretation:
- Here, $x$ represents minority status (coded as 1 for minority, 0 for non-minority).
- Predicted salary for non-minorities (x=0) is $1,676, while for minorities (x=1), it is $1,411 (calculated from $1676 - 265$).

Null Hypothesis ($H0$): $β1 = 0$ (slope is equal to zero; no relationship between $x$ and $y$).
Alternative Hypothesis ($Ha$): $b1
eq 0$ (slope is different from zero; a relationship exists).
Sampling Distribution:
- A histogram representing the likelihood of various statistics for $b_1$ is created based on theoretical repeated samples.
Example Output:
- Coefficients:
- Intercept ($b_0$): $1,675.58$
- Slope ($b_1$): $-265.27$
- Standard Error of Intercept: $49.31$
- Standard Error of Slope: $73.50$
- t-statistic for slope: $-3.61$
- p-value for slope: $0.00043$
Interpretation of t-statistic:
- The t-statistic reflects the number of standard deviations away from the expected mean under the null hypothesis.
- In this case, it indicates significance since the p-value is less than the alpha level of 0.05, leading to the rejection of the null hypothesis.
- Conclusion: The minority coefficient is significantly different, indicating that minority employees earn less than their non-minority counterparts.

Size vs. Importance:
- The size of the coefficient $b$ does not necessarily indicate its importance; it varies with the scale of $x$ and $y$.
- Example:
- Small scale for $y$: $b_1 = 0.01$ may be statistically significant.
- Large scale for $y$: $b_1 = 10,000$ may appear significant, but could be statistically insignificant.
Statistical Testing:
- Always conduct a t-test to determine if the b weight is significantly different from zero; don't rely on visual estimates.
Effect of Range Truncation:
- Truncating the range of $x$ can lead to misleading conclusions about its impact on $y$.
- Example case: Predicting first-year GPA using GMAT scores that have been truncated to a narrow range may yield non-significant results even if a significant relationship exists in a broader dataset.

Y-Intercept:
- $b_0$ when $x=0$ is meaningful and predicts the monthly salary ($1,676 for non-minority employees).
Slope Interpretation:
- A slope of $-265$ implies that for every increase of $1$ unit in $x$ (moving from non-minority to minority), there is a predicted decrease of $265 in salary.
- Example Calculation:
- For a minority employee ($x=1$):
  $ext{ extit{y-hat}} = 1676 - 265(1) = 1411$

T-Test for Intercept:
- Rarely utilized; often not useful when $x=0$ is not within data range.
- Tests whether the intercept's true value differs from zero.
- Any significant result here is often less interesting.
P-Values:
- All p-values for coefficients are for two-tailed tests despite what Excel indicates.
- Always ascertain the direction of the slope when interpreting p-values.
ANOVA & T-Test:
- For simple regression with only one predictor, $F = t^2$, thus the p-values are the same for both tests.
Confidence Intervals:
- Separate from prediction intervals; interpretation requires caution.
Suggestions for Reporting:
- Simplify outputs, removing unnecessary information that may overwhelm readers.
Cautions regarding Forecasting:
- Predicting beyond the range of $x$ can lead to inaccuracies. Based on trends, the relationship may not hold outside the observed data, necessitating careful consideration and validation of assumptions before making forecasts.