Regression Analysis Notes
Regression Analysis
Definition of Regression Equation
- Represents the algebraic relationship between two sets of data (variables).
- Specifically, it represents the best overall algebraic relationship.
- Variables are typically labeled x and y.
Independent and Dependent Variables
- x variable: Independent variable.
- Often used to make predictions using the regression equation.
- y variable: Dependent variable.
- Its value depends on the x value.
- Alternative names:
- Explanatory variable (x).
- Response variable (y).
- Caution: These names can be misleading, suggesting a causal connection when one may not exist.
- Rearrangement of the slope-intercept form (y=mx+b).
- Statisticians typically present it as: y=b<em>0+b</em>1x, where:
- b0: y-intercept.
- b1: Coefficient of the x variable (slope).
- The order (y-intercept first) allows for extensions to multiple independent variables.
- Regression process determines the values of b<em>0 and b</em>1.
Regression Line (Best Fit Line)
- A graph of the regression equation.
- Visual representation of the trend on a scatter plot.
Excel Implementation: Adding Regression Line and Equation
- Steps to add a trendline:
- Select the scatter plot.
- Go to "Chart Tools" > "Design" > "Add Chart Element" > "Trendline" > "Linear".
- The trendline provides a visual representation of the data's overall pattern.
- Points close to the line are accurately predicted, while those farther have larger prediction errors.
- A regression line represents a prediction -- for a given x value, the line predicts a corresponding y value.
Displaying the Regression Equation
- Steps to display equation:
- Go to "Add Chart Element" > "Trendline" > "More Trendline Options".
- Check "Display Equation on Chart".
- Excel displays the equation in slope-intercept form, with a default of four decimal places for coefficients.
- Equation allows for prediction of y (dependent variable) given x (independent variable).
Interpretation of the Equation
- The coefficient of the x value in the equation represents the relationship between the check total and the tip.
- Example: A coefficient of 0.1973 suggests an average tip rate of approximately 19.7%.
- The y-intercept adjusts this percentage (e.g it looks like it's about 55¢ less than 19.7% based on the other value in the equation).
- Activating the Toolpack:
- Go to "File" > "Options" > "Add-ins".
- In the "Manage" dropdown, select "Excel Add-ins" and click "Go".
- Check "Analysis ToolPak" and click "OK".
- Accessing Regression:
- Go to the "Data" tab > "Analysis" > "Data Analysis".
- Select "Regression" from the list.
- Specifying Input Ranges:
- Input Y Range: Select the range of the dependent variable (tips).
- Input X Range: Select the range of the independent variable (check totals).
- Check "Labels" if you include column headings in your selection. this indicates that the first row contains labels.
- Output Options:
- Choose "New Worksheet Ply" to create a new tab for the output; give it a descriptive name.
- Check "Line Fit Plots" to generate a scatter plot with the regression line.
Regression Output Analysis
- Correlation Coefficient: The Toolpack calculates Pearson's Correlation Coefficient, same as the
CORREL function. - Regression Equation: Provides the intercept and coefficients for the regression equation.
- Line Fit Plots : A scatter plot with data points (actual tips) and predicted tips. Notice the axes are labeled for checks and tips.
Residual Output Table
- Observation: Numbers each check-tip pair.
- Predicted tip: The tip value predicted by the regression line for a given check total.
- Residuals: The difference between the actual tip and the predicted tip (Actual - Predicted). A negative residual means the prediction was too high; a positive residual means the prediction was too low.
- Residuals=Observed−Predicted