Linear Regression Notes
Biostatistics and Algebra - FHMS 103: Linear Regression
Learning Objectives
Describe the Linear Regression Model
State the Regression Modeling Steps
Explain Ordinary Least Squares
Compute Regression Coefficients
Understand and check model assumptions
Predict Response Variable
Comments of R Output
Correlation Models
Link between a correlation model and a regression model
Test of coefficient of Correlation
Models
What is a Model?
Representation of Some Phenomenon (Non-Math/Stats Model)
What is a Math/Stats Model?
Often Describes Relationship between Variables
Types:
Deterministic Models (no randomness)
Probabilistic Models (with randomness)
Deterministic Models
Hypothesize Exact Relationships
Suitable When Prediction Error is Negligible
Example: Body mass index (BMI) is a measure of body fat based on:
Metric Formula:
Non-metric Formula:
Probabilistic Models
Hypothesize 2 Components:
Deterministic
Random Error
Example: Systolic blood pressure of newborns Is 6 Times the Age in days + Random Error
Random Error May Be Due to Factors Other Than age in days (e.g. Birthweight)
Types of Probabilistic Models
Regression Models
Correlation Models
Other Models
Regression Models
Relationship between one dependent variable and explanatory variable(s)
Use equation to set up relationship
Numerical Dependent (Response) Variable
1 or More Numerical or Categorical Independent (Explanatory) Variables
Used Mainly for Prediction & Estimation
Regression Modeling Steps
Hypothesize Deterministic Component
Estimate Unknown Parameters
Specify Probability Distribution of Random Error Term
Estimate Standard Deviation of Error
Evaluate the fitted Model
Use Model for Prediction & Estimation
Model Specification
Specifying the deterministic component
Define the dependent variable and independent variable
Hypothesize Nature of Relationship
Expected Effects (i.e., Coefficients’ Signs)
Functional Form (Linear or Non-Linear)
Interactions
Model Specification Is Based on Theory
Theory of Field (e.g., Epidemiology)
Mathematical Theory
Previous Research
‘Common Sense’
Types of Regression Models
Simple
Multiple
Linear
Non-Linear
Linear Regression Model
Linear Equations: where:
= Y-intercept
= Slope
Linear Regression Model
Relationship Between Variables Is a Linear Function
Dependent (Response) Variable (e.g., CD+ c.)
Independent (Explanatory) Variable (e.g., Years s. serocon.)
Population Slope
Population Y-Intercept
Random Error
Population & Sample Regression Models
Population: (Unknown Relationship)
Random Sample:
Population Linear Regression Model
Observed value:
= Random error
Sample Linear Regression Model
Observed value:
= Random error
Estimating Parameters: Least Squares Method
Scatter plot:
Plot of All (, ) Pairs
Suggests How Well Model Will Fit
Least Squares
‘Best Fit’ Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum. But Positive Differences Off-Set Negative ones. So square errors!
LS Minimizes the Sum of the Squared Differences (errors) (SSE)
Least Squares Graphically
LS minimizes
Coefficient Equations
Prediction equation:
Sample slope:
Sample Y - intercept:
Derivation of Parameters (1)
Least Squares (L-S): Minimize squared error
Computation Table
: | : | : | : | : |
Interpretation of Coefficients
Slope ()
Estimated Y Changes by for Each 1 Unit Increase in X
If = 2, then Y Is Expected to Increase by 2 for Each 1 Unit Increase in X
Y-Intercept ()
Average Value of Y When X = 0
If = 4, then Average Y Is Expected to Be 4 When X Is 0
Parameter Estimation Example
Obstetrics: What is the relationship between Mother’s Estriol level & Birthweight using the following data?
Estriol (mg/24h) | Birthweight (g/1000) |
|---|---|
1 | 1 |
2 | 1 |
3 | 2 |
4 | 2 |
5 | 4 |
Parameter Estimation Solution Table
1 | 1 | 1 | 1 | 1 |
2 | 1 | 4 | 1 | 2 |
3 | 2 | 9 | 4 | 6 |
4 | 2 | 16 | 4 | 8 |
5 | 4 | 25 | 16 | 20 |
15 | 10 | 55 | 26 | 37 |
Parameter Estimation Solution
Coefficient Interpretation Solution
Slope ()
Birthweight (Y) Is Expected to Increase by .7 Units for Each 1 unit Increase in Estriol (X)
Intercept ()
Average Birthweight (Y) Is -.10 Units When Estriol level (X) Is 0
Difficult to explain
The birthweight should always be positive
Parameter Estimation R codes
# Linear regression
# Birthweight and mother’s estriol level example data
el<-c(1,2,3,4,5) # this is mother’s estriol level
bw<-c(1,1,2,2,4) # this is child birthweight
mod<-lm(bw~el) # fitting linear regression model
summary(mod) # call for the results from the model
Parameter Estimation R Computer Output
Parameter | Standard | ||||
|---|---|---|---|---|---|
Variable | DF | Estimate | Error | t Value | Pr > |
Intercept | 1 | -0.10000 | 0.63509 | -0.16 | 0.8849 |
Estriol | 1 | 0.70000 | 0.19149 | 3.66 | 0.0354 |
Parameter Estimation Thinking Challenge
You’re a Vet epidemiologist for the county cooperative. You gather the following data:
Food (lb.) | Milk yield (lb.) |
|---|---|
4 | 3.0 |
6 | 5.5 |
10 | 6.5 |
12 | 9.0 |
What is the relationship between cows’ food intake and milk yield?
Parameter Estimation Solution Table
4 | 3.0 | 16 | 9.00 | 12 |
6 | 5.5 | 36 | 30.25 | 33 |
10 | 6.5 | 100 | 42.25 | 65 |
12 | 9.0 | 144 | 81.00 | 108 |
32 | 24.0 | 296 | 162.50 | 218 |
Parameter Estimation Solution
Coefficient Interpretation Solution
Slope ()
Milk Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Food intake (X)
Y-Intercept ()
Average Milk yield (Y) Is Expected to Be 0.8 lb. When Food intake (X) Is 0
BMI = \frac{\text{Weight i
BMI = \frac{\text{Weight (pounds)} \times 703}{(\t