Regression Part 2
Applied Business Statistics Regression Part 2
Instructor Information
Course: BNAN 562
Instructor: Tamar Kugler, Associate Professor of Management and Organizations
Institution: University of Arizona Eller MBA
Standard Error
Definition: Standard error of the estimate (denoted as $s_{y.x}$) is described as the "standard deviation of the errors". It measures the dispersion of data points above and below the regression line.
Formula: s{y.x} = rac{SSE}{n-2} = rac{ ext{sqrt} igg( extstyle{ extstyle igg( extstyle extstyle igg( extstyle(y{i} - ext{ extit{y-hat}}_{i})^{2}igg)} extstyle igg) extstyleigg)}igg)}{n-2} where,
$SSE$ = Sum of Squared Errors
$n$ = number of observations
Interpretation:
Represents the degree of scatter around the regression line.
If $s_{y.x} = 0$, this indicates perfect prediction where $R^2 = 1.0$.
A larger standard error indicates a worse fit and less reliable predictions from the regression.
Example Equation:
This illustrates how the predicted value is calculated using the regression equation.
Testing Significance: Conduct a t-test to determine if the predictor variable (denoted as $x$) significantly predicts the outcome variable (denoted as $y$).
Regression Interpretation
Main Questions:
What is the regression equation?
Is the slope significantly different from zero?
Regression Equation
Based on output coefficients, the regression equation is formulated as follows:
Variable Interpretation:
Here, $x$ represents minority status (coded as 1 for minority, 0 for non-minority).
Predicted salary for non-minorities (x=0) is $1,676, while for minorities (x=1), it is $1,411 (calculated from $1676 - 265$).
Hypothesis Testing for the Slope
Null Hypothesis ($H0$): $β1 = 0$ (slope is equal to zero; no relationship between $x$ and $y$).
Alternative Hypothesis ($Ha$): $b1
eq 0$ (slope is different from zero; a relationship exists).Sampling Distribution:
A histogram representing the likelihood of various statistics for $b_1$ is created based on theoretical repeated samples.
Example Output:
Coefficients:
Intercept ($b_0$): $1,675.58$
Slope ($b_1$): $-265.27$
Standard Error of Intercept: $49.31$
Standard Error of Slope: $73.50$
t-statistic for slope: $-3.61$
p-value for slope: $0.00043$
Interpretation of t-statistic:
The t-statistic reflects the number of standard deviations away from the expected mean under the null hypothesis.
In this case, it indicates significance since the p-value is less than the alpha level of 0.05, leading to the rejection of the null hypothesis.
Conclusion: The minority coefficient is significantly different, indicating that minority employees earn less than their non-minority counterparts.
Interpretation of b Weight (Coefficient)
Size vs. Importance:
The size of the coefficient $b$ does not necessarily indicate its importance; it varies with the scale of $x$ and $y$.
Example:
Small scale for $y$: $b_1 = 0.01$ may be statistically significant.
Large scale for $y$: $b_1 = 10,000$ may appear significant, but could be statistically insignificant.
Statistical Testing:
Always conduct a t-test to determine if the b weight is significantly different from zero; don't rely on visual estimates.
Effect of Range Truncation:
Truncating the range of $x$ can lead to misleading conclusions about its impact on $y$.
Example case: Predicting first-year GPA using GMAT scores that have been truncated to a narrow range may yield non-significant results even if a significant relationship exists in a broader dataset.
Interpretation of the Equation
Y-Intercept:
$b_0$ when $x=0$ is meaningful and predicts the monthly salary ($1,676 for non-minority employees).
Slope Interpretation:
A slope of $-265$ implies that for every increase of $1$ unit in $x$ (moving from non-minority to minority), there is a predicted decrease of $265 in salary.
Example Calculation:
For a minority employee ($x=1$):
Additional Notes
T-Test for Intercept:
Rarely utilized; often not useful when $x=0$ is not within data range.
Tests whether the intercept's true value differs from zero.
Any significant result here is often less interesting.
P-Values:
All p-values for coefficients are for two-tailed tests despite what Excel indicates.
Always ascertain the direction of the slope when interpreting p-values.
ANOVA & T-Test:
For simple regression with only one predictor, $F = t^2$, thus the p-values are the same for both tests.
Confidence Intervals:
Separate from prediction intervals; interpretation requires caution.
Suggestions for Reporting:
Simplify outputs, removing unnecessary information that may overwhelm readers.
Cautions regarding Forecasting:
Predicting beyond the range of $x$ can lead to inaccuracies. Based on trends, the relationship may not hold outside the observed data, necessitating careful consideration and validation of assumptions before making forecasts.