4.2: Least Squares Regression Line

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/17

flashcard set

Earn XP

Description and Tags

Learning Objectives: 1. Find the least-squares regression line and use the line to make predictions 2. Interpret the slope and the y-intercept of the least-squares regression line 3. Compute the sum of squared residuals

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

18 Terms

1
New cards

Preview:

regression uses one or more explanatory variables (x) to predict one response variable (y). So, what does this imply of ‘linear’ regression?

The "linear" part is that we will be using a straight line to predict the response variable using the explanatory variable(s)

2
New cards

What is the equation for a straight line?

y = mx + b

3
New cards

What is ‘Slope-intercept form’ ?

  • Slope-intercept form is a way to write the equation of a straight line

      • 𝑚 : The slope of the line, which indicates how steep the line is

      • 𝑏 : The y-intercept, which is the point where the line crosses the y-axis

      • 𝑥 : The distance of the line from the x-axis

      • 𝑦 : The distance of the line from the y-axis

4
New cards

What is the y-intercept?

The y-intercept is the location on the y axis where the line passes through;

  • the​ y-intercept is found by letting x=0 in an equation and solving for y.

    • When interpreting the​ y-intercept, ask if 0 is a reasonable value for the explanatory​ variable, and if any observations near x=0 exist in the data set. If the answer to either of those questions is​ "no," do not interpret the​ y-intercept.

5
New cards

What is the equation for slope (m) ?

  • Slope tells us how steep a line is, like how steep a hill is

    • Rise (y) over Run (x): We find the slope by seeing how much we go up or down (vertical change) for each step to the right (horizontal change).

This version is obviously more complicated than your typical rise/run equation. Thats because we are trying to find the line using two random values of x and y.

<ul><li><p>Slope tells us how <strong><em>steep</em></strong> a line is, like how steep a hill is</p><ul><li><p>Rise (y) over Run (x): We find the slope by seeing how much we go up or down (vertical change) for each step to the right (horizontal change).</p></li></ul></li></ul><p>This version is obviously more complicated than your typical rise/run equation. Thats because we are trying to find the line using <em>two</em> random values of x and y.</p>
6
New cards

For every one unit increase in x the predicted value of y increases by the value of what?

  • the slope

7
New cards

What is the “predicted” y ?

the value of y estimated by the regression line based on the corresponding x value.

  • Symbol: Usually represented as "ŷ" (y-hat).

  • Calculation: Determined using the regression equation, which takes the form of "ŷ = a + bx" where "a" is the y-intercept and "b" is the slope. 

  • Interpretation: For a given x value, the "predicted y" represents the "best guess" for what the corresponding y value would be based on the relationship observed in the data. 

8
New cards

what is the “observed” y

  • the actual measured value of the dependent variable (y) in a data set. Essentially, it's the "true" value of y for a given x in your data set.

10
New cards

What is the ‘Equation of Line’ ?

  • Is used to find where a line passes through two points

<ul><li><p>Is used to find where a line passes through two points</p></li></ul><p></p>
11
New cards

Equations Summary

knowt flashcard image
12
New cards

EX: Using Equations to describe Linearly Related Data

First, find the slope of two points by using slope formula.  Then, plug m into the equation of line to get your y=mx

13
New cards

What is Residual?

  • The difference between the observed value of y and the predicted value of y

    • That is, ‘observed y − predicted y = residual'

      • If it is​ positive, then the observed value is greater than the predicted value.

        • this means that the observed value must be greater than the predicted value.

14
New cards

Part 2: Find the residual using the line and the predicted value at x = 3 from the predicted y.

  • plug numbers into residual formula

    • = observed y − predicted y

    • = 5.2 − 4.75

    • = 0.45


!! NOTICE !! — that we just picked two random points from our data set and based our equation around those two points.

- How do we know we shouldn’t have used two other points?

- How do we know which line would have been the “best line”?

  • There is actually a method that ensures we have a line that best fits our data. It uses all of our data points to come up with this line and its equation. Use Minitab to find it, and we should know the method is known as “Least Squares Regression”.

<ul><li><p>plug numbers into residual formula</p><ul><li><p>= observed <em>y</em> − predicted <em>y</em></p></li><li><p>= 5.2 − 4.75</p></li><li><p>= 0.45</p></li></ul></li></ul><div data-type="horizontalRule"><hr></div><p>!! NOTICE !! — that we just picked two random points from our data set and based our equation around those two points.</p><p>- How do we know we shouldn’t have used two other points?</p><p>- How do we know which line would have been the “best line”?</p><ul><li><p>There is actually a <span><strong><mark data-color="yellow" style="background-color: yellow; color: inherit">method</mark></strong></span><strong> that ensures </strong>we have a<strong> line </strong>that<strong> best fits our data. It uses all of our data points to come up with this line and its equation</strong>. <strong>Use Minitab to find it</strong>, and we should know the <strong><mark data-color="yellow" style="background-color: yellow; color: inherit"><u>method is known as “</u></mark><em><mark data-color="yellow" style="background-color: yellow; color: inherit"><u>Least Squares Regression</u></mark></em><mark data-color="yellow" style="background-color: yellow; color: inherit"><u>”.</u></mark></strong></p></li></ul><p></p>
15
New cards

Least Squares Regression Line

response variable is on the y axis and explanatory is on the x axis

16
New cards

Least Squares Regression Line

b1 = slope

b0 = y-intercept

<p></p><img src="https://knowt-user-attachments.s3.amazonaws.com/2e41080c-4a44-4f99-9a55-d6f4ca09564c.png" data-width="100%" data-align="center" alt="

"><p>b1 = slope</p><p>b0 = y-intercept</p><p></p>
17
New cards

EXAMPLE Finding the Least-squares Regression Line

Using the drilling data:

  • a) Find the least-squares regression line (round the estimates of slope and intercept to four decimal places)

  • b) Predict the drilling time if drilling starts at 130 feet.

  • c) Is the observed drilling time at 130 feet above, or below, what we would predict?

  • d) Draw the least-squares regression line on the scatter diagram of the data.

  • e) Interpret the slope

  • f) Interpret the y-intercept

Depth at Which Drilling Begins, x (in feet)

Time to Drill
5 Feet, y
(in minutes)

35

5.88

50

5.99

75

6.74

95

6.1

120

7.47

130

6.93

145

6.42

155

7.97

160

7.92

175

7.62

185

6.89

190

7.9

Use Minitab to get the equation of the regression line

  • Stat > Regression > Fitted Line Plot

  • Input your variables and click OK.

  • The very top is your equation.

If you scroll down, you will see a scatterplot with the regression line overlayed upon it.


a) ŷ = 0.0116x + 5.5273

b) ŷ = 0.0116x + 5.5273 —> 0.0116(130) +5.5273 = 7.035

c) The observed drilling time is 6.93 seconds. The predicted drilling time is 7.035 seconds. The drilling time of 6.93 seconds is below what we would predict.

d) **see the picture**

e) The slope of the regression line is 0.0116. For each additional foot of depth we start drilling, the time to drill five feet increases by 0.0116 minutes, on average.

f) The y-intercept of the regression line is 5.5273. To interpret it we must first ask two questions:

  1. Is 0 a reasonable value for the explanatory variable? 

  2. Do any observations near x = 0 exist in the data set? 

A value of 0 is reasonable for the drilling data (this indicates that drilling begins at the surface of Earth. The smallest observation in the data set is x = 35 feet, which is reasonably close to 0. So, interpretation of the y-intercept is reasonable. 

The time to drill five feet when we begin drilling at the surface of Earth is 5.5273 minutes.

<p>Use Minitab to get the equation of the regression line</p><ul><li><p class="p1">Stat &gt; Regression &gt; Fitted Line Plot</p></li><li><p class="p1">Input your variables and click OK.</p></li><li><p class="p1">The very top is your equation.</p></li></ul><p class="p4">If you scroll down, you will see a scatterplot with the regression line overlayed upon it.</p><div data-type="horizontalRule"><hr></div><p class="p4">a) ŷ = 0.0116x + 5.5273</p><p class="p4">b) ŷ = 0.0116<strong>x </strong>+ 5.5273 —&gt; 0.0116(<strong>130</strong>) +5.5273 = 7.035</p><p class="p4">c) The observed drilling time is 6.93 seconds. The predicted drilling time is 7.035 seconds. The drilling time of 6.93 seconds is below what we would predict.</p><p class="p4">d) **see the picture**</p><p class="p4">e) The slope of the regression line is 0.0116. For each additional foot of depth we start drilling, the time to drill five feet increases by 0.0116 minutes, on average.</p><p class="p4">f) The <em>y</em>-intercept of the regression line is 5.5273. To interpret it we must first ask two questions:</p><ol><li><p class="p4">Is 0 a reasonable value for the explanatory variable?&nbsp;</p></li><li><p class="p4">Do any observations near <em>x</em> = 0 exist in the data set?&nbsp;</p></li></ol><p class="p4">A value of 0 is reasonable for the drilling data (this indicates that drilling begins at the surface of Earth. The smallest observation in the data set is <em>x</em> = 35 feet, which is reasonably close to 0. So, interpretation of the <em>y</em>-intercept is reasonable.&nbsp;</p><p class="p2">The time to drill five feet when we begin drilling at the surface of Earth is 5.5273 minutes.</p><p></p>
18
New cards

What does it mean when the researcher is using Extrapolation?

  • It means they are working outside the scope of the model.

    • When the least-squares regression line is used to make predictions based on values of the explanatory variable that are much larger or much smaller than the observed values


Never use a least-squares regression line to make predictions outside the scope of the model !!!!!! (that is, to extrapolate) because we can’t be sure the linear relation continues to exist.