1/74
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
What is a simple linear regression model?
Y_i = β_0 + β_1*X_i + ε_i, where Y_i is the response, X_i is the predictor, β_0 is the intercept, β_1 is the slope, and ε_i is the random error term
What are the assumptions about the error terms in simple linear regression?
The errors are independent, normally distributed with mean 0 and constant variance σ²: ε_i ~ N(0, σ²)
What is the least squares criterion?
Minimize the sum of squared deviations: Q = Σ(Y_i - β_0 - β_1*X_i)²
What is the formula for the least squares estimate of the slope b_1?
b_1 = Σ(X_i - X̄)(Y_i - Ȳ) / Σ(X_i - X̄)²
What is the formula for the least squares estimate of the intercept b_0?
b_0 = Ȳ - b_1*X̄
What does BLUE stand for?
Best Linear Unbiased Estimator - the Gauss-Markov theorem states that least squares estimates are ___
What is SSE (Error Sum of Squares)?
Σ(Y_i - Ŷ_i)² = Σe_i²
What is the formula for MSE (Mean Square Error)?
SSE/(n-2), which is an unbiased estimate of σ²
What is a residual?
e_i = Y_i - Ŷ_i, the difference between the observed value and the fitted value
What property do residuals have regarding their sum?
The residuals always sum to zero: Σe_i = 0
What is SSTO (Total Sum of Squares)?
Σ(Y_i - Ȳ)², measures total variation in Y
What is SSR (Regression Sum of Squares)?
Σ(Ŷ_i - Ȳ)², measures variation explained by the regression
What is the relationship between SSTO, SSR, and SSE?
SSTO = SSR + SSE (total variation = explained + unexplained)
What is the coefficient of determination R²?
SSR/SSTO = 1 - SSE/SSTO, the proportion of variation in Y explained by X
What range can R² take?
0 ≤ R² ≤ 1, with values closer to 1 indicating better fit
What is the sampling distribution of b_1?
b_1 ~ N(β_1, σ²/Σ(X_i - X̄)²) under the normal error model
What is the estimated standard error of b_1?
s{b_1} = √[MSE/Σ(X_i - X̄)²]
What is the test statistic for testing H_0: β_1 = 0?
t* = b_1/s{b_1}, which follows a t-distribution with n-2 degrees of freedom
What is the formula for a confidence interval for β_1?
b_1 ± t(n-2; α/2) * s{b_1}
What is the variance of b_0?
σ²{b_0} = σ²[1/n + X̄²/Σ(X_i - X̄)²]
What is the estimated variance of the mean response at X_h?
s²{Ŷ_h} = MSE[1/n + (X_h - X̄)²/Σ(X_i - X̄)²]
What is the confidence interval for E{Y_h}?
Ŷ_h ± t(n-2; α/2) * s{Ŷ_h}
What is the prediction interval for a new observation at X_h?
Ŷ_h ± t(n-2; α/2) * s{pred}, where s²{pred} = MSE[1 + 1/n + (X_h - X̄)²/Σ(X_i - X̄)²]
Why is the prediction interval wider than the confidence interval?
The prediction interval accounts for both the uncertainty in estimating the mean response AND the variability of individual observations around that mean
What is the F-test in ANOVA for regression testing?
F* = MSR/MSE, tests H_0: β_1 = 0 vs H_A: β_1 ≠ 0
What are the degrees of freedom for the F-test in simple linear regression?
df_regression = 1, df_error = n-2
What is the relationship between the F-test and t-test for β_1 = 0?
F = (t)², they are equivalent tests
What is a semistudentized residual?
e_i* = e_i/√MSE, an approximation to a standardized residual
What does a plot of residuals vs fitted values check for?
Linearity of the relationship and constant variance (homoscedasticity)
What pattern in residuals vs fitted values suggests non-constant variance?
A funnel shape (increasing or decreasing spread)
What does a plot of residuals vs time order check for?
Independence of errors over time
What does a normal probability plot (QQ plot) assess?
Whether the residuals follow a normal distribution
What does a U-shaped pattern in residuals vs X suggest?
The relationship may be nonlinear (quadratic)
What is the correlation test for normality?
Correlate observed residuals with expected normal residuals; reject normality if correlation is too low
What is the Brown-Forsythe (Modified Levene) test?
Tests for constant variance by comparing absolute deviations from group medians
What is the Durbin-Watson test?
Tests for autocorrelation (serial correlation) in residuals when data are collected over time
What is the formula for the Durbin-Watson statistic?
D = Σ(e_i - e_{i-1})²/Σe_i²
What is the lack of fit test?
Tests whether the true regression function is linear; requires replicate observations at some X levels
What is Pure Error in the lack of fit test?
SSPE = ΣΣ(Y_ij - Ȳ_j)², variation within groups at the same X level
What is the Lack of Fit sum of squares?
SSLF = SSE - SSPE, measures deviation of group means from the fitted line
What is the F-statistic for the lack of fit test?
F* = (SSLF/(c-2))/(SSPE/(n-c)), where c is the number of distinct X levels
What does rejecting the null hypothesis in a lack of fit test indicate?
The linear model is inadequate; the true relationship may be nonlinear
What is the Box-Cox transformation?
y' = (y^λ - 1)/λ for λ ≠ 0, or y' = log(y) for λ = 0; used to stabilize variance and normalize errors
What does regression through the origin mean?
Fitting the model Y_i = β_1*X_i + ε_i with no intercept term
What is the least squares estimate for regression through the origin?
b_1 = ΣX_i*Y_i/ΣX_i²
What problem can occur with R² in regression through the origin?
R² can be negative and loses its interpretation as proportion of variance explained
What are Bonferroni joint confidence intervals?
Simultaneous confidence intervals for multiple parameters using individual confidence level 1-α/g for g intervals
What is the Working-Hotelling confidence band?
Simultaneous confidence band for the entire regression line: Ŷ ± W*s{Ŷ}, where W = √(2F(2, n-2; α))
What is inverse prediction?
Estimating the X value that corresponds to an observed or desired Y value
What is a matrix?
A rectangular array of elements arranged in rows and columns
What is the dimension of a matrix?
r × c, where r is the number of rows and c is the number of columns
What is a vector?
A special case of a matrix that is either r×1 (column vector) or 1×c (row vector)
What is the transpose of a matrix A?
A', obtained by interchanging the rows and columns of A
What is a symmetric matrix?
A square matrix where A = A'
What is a diagonal matrix?
A square matrix with all off-diagonal elements equal to 0
What is an identity matrix I?
A diagonal matrix with 1's on the diagonal
What is the simple linear regression model in matrix form?
Y = Xβ + ε, where Y is n×1, X is n×2, β is 2×1, and ε is n×1
What is the design matrix X in simple linear regression?
X = [1 X_1; 1 X_2; ...; 1 X_n], a matrix with first column all 1's and second column the X values
What is the solution for b in matrix form?
b = (X'X)^{-1}X'Y, the least squares estimate
What is X'X for simple linear regression?
X'X = [n ΣX_i; ΣX_i ΣX_i²], a 2×2 matrix
What is X'Y for simple linear regression?
X'Y = [ΣY_i; ΣX_i*Y_i], a 2×1 vector
What is the hat matrix H?
H = X(X'X)^{-1}X', projects Y onto Ŷ
What property does the hat matrix have?
H is symmetric and idempotent: HH = H
How are fitted values expressed using the hat matrix?
Ŷ = Hy, the hat matrix "puts the hat on Y"
How are residuals expressed in matrix form?
e = Y - Ŷ = (I - H)Y
What is the covariance matrix of b?
σ²{b} = σ²(X'X)^{-1}
What is the estimated covariance matrix of b?
s²{b} = MSE(X'X)^{-1}
What is SSE in matrix form?
SSE = Y'Y - b'X'Y = e'e
What is SSR in matrix form?
SSR = b'X'Y - (ΣY_i)²/n
What does linear dependence of columns mean?
Columns are linearly dependent if there exist constants (not all zero) such that a linear combination equals the zero vector
What is the rank of a matrix?
The maximum number of linearly independent columns (or rows) in the matrix
What is the inverse of a matrix A?
A^{-1} is the matrix such that AA^{-1} = A^{-1}A = I
When does a matrix inverse exist?
Only for square matrices that have full rank
What is the general linear test approach?
Compares a full model (H_A) to a reduced model (H_0) using F* = [(SSE_R - SSE_F)/(df_R - df_F)]/[SSE_F/df_F]
What are the degrees of freedom for the full and reduced models in simple linear regression?
Full model: df_F = n-2; Red: df_F = n-1