Regression Part 2

Applied Business Statistics Regression Part 2

Instructor Information

  • Course: BNAN 562

  • Instructor: Tamar Kugler, Associate Professor of Management and Organizations

  • Institution: University of Arizona Eller MBA

Standard Error

  • Definition: Standard error of the estimate (denoted as $s_{y.x}$) is described as the "standard deviation of the errors". It measures the dispersion of data points above and below the regression line.

  • Formula: s{y.x} = rac{SSE}{n-2} = rac{ ext{sqrt} igg( extstyle{ extstyle igg( extstyle extstyle igg( extstyle(y{i} - ext{ extit{y-hat}}_{i})^{2}igg)} extstyle igg) extstyleigg)}igg)}{n-2} where,

    • $SSE$ = Sum of Squared Errors

    • $n$ = number of observations

  • Interpretation:

    • Represents the degree of scatter around the regression line.

    • If $s_{y.x} = 0$, this indicates perfect prediction where $R^2 = 1.0$.

    • A larger standard error indicates a worse fit and less reliable predictions from the regression.

  • Example Equation: extextityhat=1676265xext{ extit{y-hat}} = 1676 - 265x

    • This illustrates how the predicted value is calculated using the regression equation.

  • Testing Significance: Conduct a t-test to determine if the predictor variable (denoted as $x$) significantly predicts the outcome variable (denoted as $y$).

Regression Interpretation

  • Main Questions:

    1. What is the regression equation?

    2. Is the slope significantly different from zero?

Regression Equation
  • Based on output coefficients, the regression equation is formulated as follows:
    extextityhat=1,676265imesextitxext{ extit{y-hat}} = 1,676 - 265 imes extit{x}

  • Variable Interpretation:

    • Here, $x$ represents minority status (coded as 1 for minority, 0 for non-minority).

    • Predicted salary for non-minorities (x=0) is $1,676, while for minorities (x=1), it is $1,411 (calculated from $1676 - 265$).

Hypothesis Testing for the Slope
  • Null Hypothesis ($H0$): $β1 = 0$ (slope is equal to zero; no relationship between $x$ and $y$).

  • Alternative Hypothesis ($Ha$): $b1
    eq 0$ (slope is different from zero; a relationship exists).

  • Sampling Distribution:

    • A histogram representing the likelihood of various statistics for $b_1$ is created based on theoretical repeated samples.

  • Example Output:

    • Coefficients:

    • Intercept ($b_0$): $1,675.58$

    • Slope ($b_1$): $-265.27$

    • Standard Error of Intercept: $49.31$

    • Standard Error of Slope: $73.50$

    • t-statistic for slope: $-3.61$

    • p-value for slope: $0.00043$

  • Interpretation of t-statistic:

    • The t-statistic reflects the number of standard deviations away from the expected mean under the null hypothesis.

    • In this case, it indicates significance since the p-value is less than the alpha level of 0.05, leading to the rejection of the null hypothesis.

    • Conclusion: The minority coefficient is significantly different, indicating that minority employees earn less than their non-minority counterparts.

Interpretation of b Weight (Coefficient)

  • Size vs. Importance:

    • The size of the coefficient $b$ does not necessarily indicate its importance; it varies with the scale of $x$ and $y$.

    • Example:

    • Small scale for $y$: $b_1 = 0.01$ may be statistically significant.

    • Large scale for $y$: $b_1 = 10,000$ may appear significant, but could be statistically insignificant.

  • Statistical Testing:

    • Always conduct a t-test to determine if the b weight is significantly different from zero; don't rely on visual estimates.

  • Effect of Range Truncation:

    • Truncating the range of $x$ can lead to misleading conclusions about its impact on $y$.

    • Example case: Predicting first-year GPA using GMAT scores that have been truncated to a narrow range may yield non-significant results even if a significant relationship exists in a broader dataset.

Interpretation of the Equation

  • Y-Intercept:

    • $b_0$ when $x=0$ is meaningful and predicts the monthly salary ($1,676 for non-minority employees).

  • Slope Interpretation:

    • A slope of $-265$ implies that for every increase of $1$ unit in $x$ (moving from non-minority to minority), there is a predicted decrease of $265 in salary.

    • Example Calculation:

    • For a minority employee ($x=1$):
      extextityhat=1676265(1)=1411ext{ extit{y-hat}} = 1676 - 265(1) = 1411

Additional Notes

  • T-Test for Intercept:

    • Rarely utilized; often not useful when $x=0$ is not within data range.

    • Tests whether the intercept's true value differs from zero.

    • Any significant result here is often less interesting.

  • P-Values:

    • All p-values for coefficients are for two-tailed tests despite what Excel indicates.

    • Always ascertain the direction of the slope when interpreting p-values.

  • ANOVA & T-Test:

    • For simple regression with only one predictor, $F = t^2$, thus the p-values are the same for both tests.

  • Confidence Intervals:

    • Separate from prediction intervals; interpretation requires caution.

  • Suggestions for Reporting:

    • Simplify outputs, removing unnecessary information that may overwhelm readers.

  • Cautions regarding Forecasting:

    • Predicting beyond the range of $x$ can lead to inaccuracies. Based on trends, the relationship may not hold outside the observed data, necessitating careful consideration and validation of assumptions before making forecasts.