Regression Analysis Notes

Regression Analysis

Definition of Regression Equation

  • Represents the algebraic relationship between two sets of data (variables).
  • Specifically, it represents the best overall algebraic relationship.
  • Variables are typically labeled xx and yy.

Independent and Dependent Variables

  • x variable: Independent variable.
    • Often used to make predictions using the regression equation.
  • y variable: Dependent variable.
    • Its value depends on the xx value.
  • Alternative names:
    • Explanatory variable (xx).
    • Response variable (yy).
    • Caution: These names can be misleading, suggesting a causal connection when one may not exist.

Linear Regression Equation Form

  • Rearrangement of the slope-intercept form (y=mx+by = mx + b).
  • Statisticians typically present it as: y=b<em>0+b</em>1xy = b<em>0 + b</em>1x, where:
    • b0b_0: y-intercept.
    • b1b_1: Coefficient of the xx variable (slope).
  • The order (y-intercept first) allows for extensions to multiple independent variables.
  • Regression process determines the values of b<em>0b<em>0 and b</em>1b</em>1.

Regression Line (Best Fit Line)

  • A graph of the regression equation.
  • Visual representation of the trend on a scatter plot.

Excel Implementation: Adding Regression Line and Equation

  • Steps to add a trendline:
    • Select the scatter plot.
    • Go to "Chart Tools" > "Design" > "Add Chart Element" > "Trendline" > "Linear".
  • The trendline provides a visual representation of the data's overall pattern.
  • Points close to the line are accurately predicted, while those farther have larger prediction errors.
  • A regression line represents a prediction -- for a given xx value, the line predicts a corresponding yy value.

Displaying the Regression Equation

  • Steps to display equation:
    • Go to "Add Chart Element" > "Trendline" > "More Trendline Options".
    • Check "Display Equation on Chart".
  • Excel displays the equation in slope-intercept form, with a default of four decimal places for coefficients.
  • Equation allows for prediction of yy (dependent variable) given xx (independent variable).

Interpretation of the Equation

  • The coefficient of the xx value in the equation represents the relationship between the check total and the tip.
  • Example: A coefficient of 0.1973 suggests an average tip rate of approximately 19.7%.
  • The y-intercept adjusts this percentage (e.g it looks like it's about 55¢ less than 19.7% based on the other value in the equation).

Alternative Method: Excel's Data Analysis Toolpack

  • Activating the Toolpack:
    • Go to "File" > "Options" > "Add-ins".
    • In the "Manage" dropdown, select "Excel Add-ins" and click "Go".
    • Check "Analysis ToolPak" and click "OK".
  • Accessing Regression:
    • Go to the "Data" tab > "Analysis" > "Data Analysis".
    • Select "Regression" from the list.
  • Specifying Input Ranges:
    • Input Y Range: Select the range of the dependent variable (tips).
    • Input X Range: Select the range of the independent variable (check totals).
    • Check "Labels" if you include column headings in your selection. this indicates that the first row contains labels.
  • Output Options:
    • Choose "New Worksheet Ply" to create a new tab for the output; give it a descriptive name.
    • Check "Line Fit Plots" to generate a scatter plot with the regression line.

Regression Output Analysis

  • Correlation Coefficient: The Toolpack calculates Pearson's Correlation Coefficient, same as the CORREL function.
  • Regression Equation: Provides the intercept and coefficients for the regression equation.
  • Line Fit Plots : A scatter plot with data points (actual tips) and predicted tips. Notice the axes are labeled for checks and tips.

Residual Output Table

  • Observation: Numbers each check-tip pair.
  • Predicted tip: The tip value predicted by the regression line for a given check total.
  • Residuals: The difference between the actual tip and the predicted tip (Actual - Predicted). A negative residual means the prediction was too high; a positive residual means the prediction was too low.
  • Residuals=ObservedPredictedResiduals = Observed - Predicted