Biostatistics exam 3 - Linear Regression

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/32

There's no tags or description

Looks like no tags are added yet.

Last updated 6:57 AM on 12/14/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

33 Terms

New cards

Regression

regression is a method that predicts values of one numerical variable from values of another numerical variable

• Fits a line through the data

- used for prediction

- measures how steeply one variable changes with another

New cards

Correlation versus regression

• correlation measures the aspects of the linear relationship between two numerical variables

- measures the association between X and Y

• regression predicts values of Y given X

New cards

Linear regression

• the most common type of regression ( there are nonlinear models)

• draws a straight line through the data to predict the response variable (Y) from the explanatory variable (X)

New cards

least squares regression

Line for which the sum of all the squared deviations in y is smallest

• deviations: distance between data point and the line

New cards

Formula for the linear regression

• a = y-intercept, b is the slope

New cards

Slope of a linear regression

• the slope of a linear regression is the rate of change in y per unit X (rise of a run)

• also measures direction of prediction

- positive: as X increases y increases

- negative: as X increases y decreases

<p>• the slope of a linear regression is the rate of change in y per unit X (rise of a run)</p><p>• also measures direction of prediction</p><p>- positive: as X increases y increases</p><p>- negative: as X increases y decreases</p>

New cards

How to calculate the slope (linear regression) - equation

• numerator measures how deviations and X and Y vary together (can be pos or neg)

• denominator is the sum of squares for x

New cards

How to calculate the intercept (linear regression)

• one slope is calculated, getting intercept is straightforward because the least squares regression always goes through (Xbar, Ybar)

• plug mean values into line formula → rearrange to solve for intercept

New cards

estimates / statistics and parameters for a linear regression

• estimates/statistics: (b) slope and intercept (a)

- estimated from a sample of measurements

• Parameters: slope (β) and intercept (α)

- from the true population

New cards

regression assumption

Regression assumes that there is a population for every value of X, and the mean Y for each of these populations lies on the regression line

• assumes the spread is the same in each subpopulation (you don't want a funnel)

New cards

Predicting values with a linear regression

•can predict values of Y for any specified value of x

- you can't predict X based off Y because (in the study) you're using the explanatory variable to predict Y not the other way around

• predictions are mean Y for all individuals with value X

• designated Y^ "Y-hat"

• use the linear regression formula to plug in a value of x and solve for y

New cards

Residual

the residual of a point is the difference between its measured Y value and the value of y predicted by the regression line

New cards

How do you measure how well the data fits the line?

• residuals measure the scatter of points above and below the least squares regression line

- can be positive or negative

• variance in residuals (MSresidual) quantifies the spread of the scatter

- residual mean square

- analogous to error square in ANOVA

- used to quantify the uncertainty of the slope

New cards

Residual mean square equation

New cards

standard error of the slope (equation)

• uncertainty (precision) with the sample estimate (b) of the population slope (β)

• the sum of squares in the denominator takes into account as you add more data points you expect more spread

• in the numerator is the spread of the residuals

<p>• uncertainty (precision) with the sample estimate (b) of the population slope (β)</p><p>• the sum of squares in the denominator takes into account as you add more data points you expect more spread</p><p>• in the numerator is the spread of the residuals</p>

New cards

Confidence interval of the slope

New cards

The two types of predictions

1. predict mean Y for a given X

- e.g. what is the mean age of all male lions whose noses are 60% black

2. predict single Y for a given X

- e.g.how old is that lion over there with a 60% black nose

* both predictions give the same value of Y-hat but they differ in precision

• can predict mean with more certainty than a single value

New cards

Confidence bands

measure the precision of the predicted mean Y for each given value of X

• curved because when sample size is smaller it gets wider

• width will be skinniest at the means of X-hat and Y-hat

New cards

Prediction intervals

Measure the precision of the predicted single Y values for each X

• wider than confidence bands because predicting a single Y value is less precise than predicting a mean Y

New cards

Interpolation

Regression should be used to predict Y for any value of X lying between the smallest and largest values of X

New cards

Extrapolation

The prediction of the value of a response variable (Y) outside the range of X values in the data

• extended prediction Beyond where you sampled

• not recommended because there's no way to ensure the relationship continues to be linear beyond the range of the data

New cards

Hypotheses for testing a slope

H₀: β = 0

Ha: β ≠ 0

New cards

test statistic for regression slope

t-statistic → measures how well our data fit the expectation of our data

New cards

t-statistic equation for regression slope

• SEb = measures uncertainty

• β₀ = Null

• df = n-2

New cards

how to get a p-value from the test statistic

determine the critical value for the t-distribution and calculate p using a stats table or computer

New cards

ANOVA (F) approach

In regression framework:

• deviations between the predicted values of Yi-hat and Ybar

-analogous to MSgroups

• deviations between each Yi and it's predictive value Yi-hat (residuals)

- analogous to MSerror

• using ANOVA approach will generate the same p-value as the t-test approach

• can be used to measure R²: the fraction of the variation in Y that is "explained" by X

New cards

Regression toward the mean

Results when two variables measured on a sample of individuals have a correlation less than one. Individuals that are far from the mean for one of the measurements will, on average, like closer to the mean for the other measurement

• in pic: solid line = linear regression, dashed line = one-to-one line with slope of 1

• are people regressing to mean or is the drug working

<p>Results when two variables measured on a sample of individuals have a correlation less than one. Individuals that are far from the mean for one of the measurements will, on average, like closer to the mean for the other measurement</p><p>• in pic: solid line = linear regression, dashed line = one-to-one line with slope of 1</p><p>• are people regressing to mean or is the drug working</p>

New cards

Assumptions of linear regression

At each value of X:

• there is a population of Y-values whose mean lies on the regression line

• the distribution of possible Y-values is normal (with the same variance)

• The variance of Y-values is the same at all values of X

• the Y measurements represent a random sample from the possible Y-values

<p>At each value of X:</p><p>• there is a population of Y-values whose mean lies on the regression line</p><p>• the distribution of possible Y-values is normal (with the same variance)</p><p>• The variance of Y-values is the same at all values of X</p><p>• the Y measurements represent a random sample from the possible Y-values</p>

New cards

3 possible issues when trying to do a linear regression

1. outliers

2. nonlinearity

3. non-normal and unequal variants

New cards

How to deal with outliers

If only one (or a low number) then it may be reasonable to report regression with and without outlier

New cards

How to detect nonlinearity

Can be detected by inspecting graphs

New cards

How to detect non-normality and unequal variances

Residual plot

New cards

Residual plot

Residual of every data point (Yi - Yi-hat) is plotted against Xi

• if assumptions of normality and equal variances are met then there should be a roughly symmetric cloud above / below line at zero

- you don't want a funnel (violation of subpopulation distribution assumption)

<p>Residual of every data point (Yi - Yi-hat) is plotted against Xi</p><p>• if assumptions of normality and equal variances are met then there should be a roughly symmetric cloud above / below line at zero</p><p>- you don't want a funnel (violation of subpopulation distribution assumption)</p>

Explore top notes

Note

Note

4.19 The Late Romantics

Updated 1164d ago

Note

Chapter 8: Photosynthesis

Updated 1289d ago

Note

Freelancing in the Creative Industries

Updated 1180d ago

Note

Chapter 10: Hypothesis Testing with Two Samples

Updated 1040d ago

Note

Chapter 32 - The Age of Globalization

Note

Note

Note

Note

4.19 The Late Romantics

Updated 1164d ago

Note

Chapter 8: Photosynthesis

Updated 1289d ago

Note

Freelancing in the Creative Industries

Updated 1180d ago

Note

Chapter 10: Hypothesis Testing with Two Samples

Updated 1040d ago

Note

Chapter 32 - The Age of Globalization

Updated 1400d ago

Note

Arrays

Updated 1041d ago

Note

Explore top flashcards

Flashcards (161)

Flashcards (121)

Characteristics of Living Things

Flashcards (21)

Flashcards (78)

Flashcards (51)

Year 11 Turisme: Bersama-sama Senior

Updated 928d ago

Flashcards (69)

Pearlman Block 1 9th Grade GAW

Flashcards (60)

Flashcards (144)

Flashcards (161)

Flashcards (121)

Characteristics of Living Things

Flashcards (21)

Flashcards (78)

Flashcards (51)

Year 11 Turisme: Bersama-sama Senior

Updated 928d ago

Flashcards (69)

Pearlman Block 1 9th Grade GAW

Updated 1117d ago

Flashcards (60)

Mech Test 3

Updated 1045d ago

Flashcards (144)