Regression Analysis Notes
Regression Analysis
Regression analysis explores the functional relationship between two or more correlated variables, often derived empirically from data. It's primarily used to predict the value of one variable based on the known values of others.
Definition
Regression is a functional relationship between two or more correlated variables that is often empirically determined from data and is used especially to predict values of one variable when given values of the others.
Regression Equation
Given paired sample data, the regression equation algebraically expresses the relationship between two variables:
Where:
is the dependent variable.
is the independent variable.
is the y-intercept.
is the slope.
The graph of this equation is the regression line, also known as the line of best fit or least squares line.
Guidelines for Using the Regression Equation
Correlation Requirement: Only use the regression equation if a linear correlation exists between the variables.
Scope Limitation: When making predictions, stay within the range of the available sample data.
Population Consistency: Avoid making predictions about populations different from the one from which the sample data was drawn.
Scatter Diagram
In estimation and forecasting, a scatter diagram offers a graphical approach. It involves plotting points corresponding to paired scores of dependent () and independent () variables on a Cartesian coordinate system.
Best Fitting Line: A line that minimizes the distance between the data points and the line.
Trend Line: A line on a scatter plot that helps to visualize the correlation between the data plotted.
Regression Line: A line of best fit for the scatter plot.
Least Squares Linear Regression Equation
Formula:
Where:
= predictor variable (independent variable)
= criterion variable (dependent variable)
& = constants
= beta coefficient
For every 1 unit increase in , there is a corresponding units increase in . This is the reason why is dependent on the value of .
Calculation of b:
Calculation of a:
Example
Data from eight employees:
Employees | Years of Working Experience () | Income (Thousand of Pesos) () |
|---|---|---|
A | 2 | 8 |
B | 8 | 10 |
C | 4 | 11 |
D | 11 | 13 |
E | 5 | 9 |
F | 13 | 17 |
G | 4 | 8 |
H | 15 | 14 |
|
Required:
a) Draw the scatter diagram and draw the trend line.
b) Find the equation of the least squares regression line.
c) Predict the income when x = 16 years.
Solution:
b) The equation of the least squares regression line is .
c) Predict the income when x = 16 years
Income y when x = 16 years
thousand pesos
Residuals
The regression equation represents the straight line that fits the data "best," and the criterion used in determining the line that is better than all the others is based on the vertical distance between the original data points and the regression line.
Such distances are called residuals. The equation for the residual is: