Simple Linear Regression: Key Concepts and Formulas for Students

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/74

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

75 Terms

1
New cards

What is a simple linear regression model?

Y_i = β_0 + β_1*X_i + ε_i, where Y_i is the response, X_i is the predictor, β_0 is the intercept, β_1 is the slope, and ε_i is the random error term

2
New cards

What are the assumptions about the error terms in simple linear regression?

The errors are independent, normally distributed with mean 0 and constant variance σ²: ε_i ~ N(0, σ²)

3
New cards

What is the least squares criterion?

Minimize the sum of squared deviations: Q = Σ(Y_i - β_0 - β_1*X_i)²

4
New cards

What is the formula for the least squares estimate of the slope b_1?

b_1 = Σ(X_i - X̄)(Y_i - Ȳ) / Σ(X_i - X̄)²

5
New cards

What is the formula for the least squares estimate of the intercept b_0?

b_0 = Ȳ - b_1*X̄

6
New cards

What does BLUE stand for?

Best Linear Unbiased Estimator - the Gauss-Markov theorem states that least squares estimates are ___

7
New cards

What is SSE (Error Sum of Squares)?

Σ(Y_i - Ŷ_i)² = Σe_i²

8
New cards

What is the formula for MSE (Mean Square Error)?

SSE/(n-2), which is an unbiased estimate of σ²

9
New cards

What is a residual?

e_i = Y_i - Ŷ_i, the difference between the observed value and the fitted value

10
New cards

What property do residuals have regarding their sum?

The residuals always sum to zero: Σe_i = 0

11
New cards

What is SSTO (Total Sum of Squares)?

Σ(Y_i - Ȳ)², measures total variation in Y

12
New cards

What is SSR (Regression Sum of Squares)?

Σ(Ŷ_i - Ȳ)², measures variation explained by the regression

13
New cards

What is the relationship between SSTO, SSR, and SSE?

SSTO = SSR + SSE (total variation = explained + unexplained)

14
New cards

What is the coefficient of determination R²?

SSR/SSTO = 1 - SSE/SSTO, the proportion of variation in Y explained by X

15
New cards

What range can R² take?

0 ≤ R² ≤ 1, with values closer to 1 indicating better fit

16
New cards

What is the sampling distribution of b_1?

b_1 ~ N(β_1, σ²/Σ(X_i - X̄)²) under the normal error model

17
New cards

What is the estimated standard error of b_1?

s{b_1} = √[MSE/Σ(X_i - X̄)²]

18
New cards

What is the test statistic for testing H_0: β_1 = 0?

t* = b_1/s{b_1}, which follows a t-distribution with n-2 degrees of freedom

19
New cards

What is the formula for a confidence interval for β_1?

b_1 ± t(n-2; α/2) * s{b_1}

20
New cards

What is the variance of b_0?

σ²{b_0} = σ²[1/n + X̄²/Σ(X_i - X̄)²]

21
New cards

What is the estimated variance of the mean response at X_h?

s²{Ŷ_h} = MSE[1/n + (X_h - X̄)²/Σ(X_i - X̄)²]

22
New cards

What is the confidence interval for E{Y_h}?

Ŷ_h ± t(n-2; α/2) * s{Ŷ_h}

23
New cards

What is the prediction interval for a new observation at X_h?

Ŷ_h ± t(n-2; α/2) * s{pred}, where s²{pred} = MSE[1 + 1/n + (X_h - X̄)²/Σ(X_i - X̄)²]

24
New cards

Why is the prediction interval wider than the confidence interval?

The prediction interval accounts for both the uncertainty in estimating the mean response AND the variability of individual observations around that mean

25
New cards

What is the F-test in ANOVA for regression testing?

F* = MSR/MSE, tests H_0: β_1 = 0 vs H_A: β_1 ≠ 0

26
New cards

What are the degrees of freedom for the F-test in simple linear regression?

df_regression = 1, df_error = n-2

27
New cards

What is the relationship between the F-test and t-test for β_1 = 0?

F = (t)², they are equivalent tests

28
New cards

What is a semistudentized residual?

e_i* = e_i/√MSE, an approximation to a standardized residual

29
New cards

What does a plot of residuals vs fitted values check for?

Linearity of the relationship and constant variance (homoscedasticity)

30
New cards

What pattern in residuals vs fitted values suggests non-constant variance?

A funnel shape (increasing or decreasing spread)

31
New cards

What does a plot of residuals vs time order check for?

Independence of errors over time

32
New cards

What does a normal probability plot (QQ plot) assess?

Whether the residuals follow a normal distribution

33
New cards

What does a U-shaped pattern in residuals vs X suggest?

The relationship may be nonlinear (quadratic)

34
New cards

What is the correlation test for normality?

Correlate observed residuals with expected normal residuals; reject normality if correlation is too low

35
New cards

What is the Brown-Forsythe (Modified Levene) test?

Tests for constant variance by comparing absolute deviations from group medians

36
New cards

What is the Durbin-Watson test?

Tests for autocorrelation (serial correlation) in residuals when data are collected over time

37
New cards

What is the formula for the Durbin-Watson statistic?

D = Σ(e_i - e_{i-1})²/Σe_i²

38
New cards

What is the lack of fit test?

Tests whether the true regression function is linear; requires replicate observations at some X levels

39
New cards

What is Pure Error in the lack of fit test?

SSPE = ΣΣ(Y_ij - Ȳ_j)², variation within groups at the same X level

40
New cards

What is the Lack of Fit sum of squares?

SSLF = SSE - SSPE, measures deviation of group means from the fitted line

41
New cards

What is the F-statistic for the lack of fit test?

F* = (SSLF/(c-2))/(SSPE/(n-c)), where c is the number of distinct X levels

42
New cards

What does rejecting the null hypothesis in a lack of fit test indicate?

The linear model is inadequate; the true relationship may be nonlinear

43
New cards

What is the Box-Cox transformation?

y' = (y^λ - 1)/λ for λ ≠ 0, or y' = log(y) for λ = 0; used to stabilize variance and normalize errors

44
New cards

What does regression through the origin mean?

Fitting the model Y_i = β_1*X_i + ε_i with no intercept term

45
New cards

What is the least squares estimate for regression through the origin?

b_1 = ΣX_i*Y_i/ΣX_i²

46
New cards

What problem can occur with R² in regression through the origin?

R² can be negative and loses its interpretation as proportion of variance explained

47
New cards

What are Bonferroni joint confidence intervals?

Simultaneous confidence intervals for multiple parameters using individual confidence level 1-α/g for g intervals

48
New cards

What is the Working-Hotelling confidence band?

Simultaneous confidence band for the entire regression line: Ŷ ± W*s{Ŷ}, where W = √(2F(2, n-2; α))

49
New cards

What is inverse prediction?

Estimating the X value that corresponds to an observed or desired Y value

50
New cards

What is a matrix?

A rectangular array of elements arranged in rows and columns

51
New cards

What is the dimension of a matrix?

r × c, where r is the number of rows and c is the number of columns

52
New cards

What is a vector?

A special case of a matrix that is either r×1 (column vector) or 1×c (row vector)

53
New cards

What is the transpose of a matrix A?

A', obtained by interchanging the rows and columns of A

54
New cards

What is a symmetric matrix?

A square matrix where A = A'

55
New cards

What is a diagonal matrix?

A square matrix with all off-diagonal elements equal to 0

56
New cards

What is an identity matrix I?

A diagonal matrix with 1's on the diagonal

57
New cards

What is the simple linear regression model in matrix form?

Y = Xβ + ε, where Y is n×1, X is n×2, β is 2×1, and ε is n×1

58
New cards

What is the design matrix X in simple linear regression?

X = [1 X_1; 1 X_2; ...; 1 X_n], a matrix with first column all 1's and second column the X values

59
New cards

What is the solution for b in matrix form?

b = (X'X)^{-1}X'Y, the least squares estimate

60
New cards

What is X'X for simple linear regression?

X'X = [n ΣX_i; ΣX_i ΣX_i²], a 2×2 matrix

61
New cards

What is X'Y for simple linear regression?

X'Y = [ΣY_i; ΣX_i*Y_i], a 2×1 vector

62
New cards

What is the hat matrix H?

H = X(X'X)^{-1}X', projects Y onto Ŷ

63
New cards

What property does the hat matrix have?

H is symmetric and idempotent: HH = H

64
New cards

How are fitted values expressed using the hat matrix?

Ŷ = Hy, the hat matrix "puts the hat on Y"

65
New cards

How are residuals expressed in matrix form?

e = Y - Ŷ = (I - H)Y

66
New cards

What is the covariance matrix of b?

σ²{b} = σ²(X'X)^{-1}

67
New cards

What is the estimated covariance matrix of b?

s²{b} = MSE(X'X)^{-1}

68
New cards

What is SSE in matrix form?

SSE = Y'Y - b'X'Y = e'e

69
New cards

What is SSR in matrix form?

SSR = b'X'Y - (ΣY_i)²/n

70
New cards

What does linear dependence of columns mean?

Columns are linearly dependent if there exist constants (not all zero) such that a linear combination equals the zero vector

71
New cards

What is the rank of a matrix?

The maximum number of linearly independent columns (or rows) in the matrix

72
New cards

What is the inverse of a matrix A?

A^{-1} is the matrix such that AA^{-1} = A^{-1}A = I

73
New cards

When does a matrix inverse exist?

Only for square matrices that have full rank

74
New cards

What is the general linear test approach?

Compares a full model (H_A) to a reduced model (H_0) using F* = [(SSE_R - SSE_F)/(df_R - df_F)]/[SSE_F/df_F]

75
New cards

What are the degrees of freedom for the full and reduced models in simple linear regression?

Full model: df_F = n-2; Red: df_F = n-1