STAT 3400 - Regression

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/46

flashcard set

Earn XP

Description and Tags

Exam 1

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

47 Terms

1
New cards

RMSE

knowt flashcard image
2
New cards

Flexible Model

adapts to changes better, non-parametric, needs more data, variance

3
New cards

Inflexible Model

less adaptable, parametric, needs less data, bias

4
New cards

Bias

error from approximating a real world system with a model, worse in inflexible models

5
New cards

Variance

error from fitting the model using training data that only represents a small subset of the population, too much leads to overfitting, worse in flexible models

6
New cards

CER

(sum of I(yi =/= yi_hat)) / n

7
New cards

RSS

E^T(E) = (y-XB)^T(y-XB), least squares seeks to minimize RSS, total squared error of the training model

8
New cards

Minimized RSS

B_hat = (X^TX)^-1(X^Ty)

9
New cards

RSE

sqrt(RSS/(n-2)), measure of how much the training data differs from the line of best fit

10
New cards

Overfitting

RSE is small but RMSE is bad, model too specific to training data to be useful

11
New cards

Multiple R²

1-RSS/TSS, the proportion of variance explained by the model, single linear regression

12
New cards

Adjusted R²

1-(RSS/TSS)((n-1)/(n-1-p)), the proportion of variance explained by the model, multiple linear regression

13
New cards

TSS

sum of (yi - y_bar)², sum of the squared differences between the training data and the mean (null)

14
New cards

MSE

RSE² = RSS/(n-1-p), estimate of the variance of the model

15
New cards

MSG

(TSS - RSS)/p, compares the performance of the null and fitted models, larger MSG values indicate that the model outperforms the null

16
New cards

F

MSG/MSE, used in ANOVA tests, values near 1 indicate the model does not improve the null, values greater than 1 indicate that it did, if the ratio is big, the model is more accurate, the larger the ratio the more likely one of the predictors is useful.

17
New cards

Explain one of the uses of an unsupervised statistical learning model.

To find the association between predictor variables. For example, cluster analysis groups data into groups to determine association. (add some more stuff for the exam)

18
New cards

Multiple linear regression with 5 predictors, test set of 250 observations. If yi is the actual response for the i-th observation and yi-hat is the predicted response, which equation models RMSE

  • sqrt ( sum from 1 to 250 of (yi–yi-hat)^2/250 )

19
New cards

Model to predict fish weight with RMSE of 0.75 lbs. In order to determine benefit we should compare 0.75 lbs to the error obtained from the model that always predicts _____.

  • The mean fish weight (null model)

20
New cards

Explain the benefit of using RMSE rather than MAE (mean absolute error)

  • Both have the correct units. Squared error penalizes really big errors more than absolute error, and really tries to avoid models with massive errors. RMSE doesn’t have the kink where MAE isn’t differentiable.

21
New cards

Which of the following is NOT a component of testing error? Bias error, variance error, flexibility error, irreducible error.

  • Flexibility error (does not exist)

22
New cards

Tow numerical variables and observe a strong, increasing, nonlinear relationship. If we train a simple linear regression model we can expect the ______ component of testing error to be relatively large. Bias or variance.

  • Bias

23
New cards

Two variables with a strong, decreasing, linear relationship. Train and test a 5th-order polynomial function and discover we have overfit the data. The ___ error was small but the ____ error was large.

  • Training, testing

24
New cards

Explain the bias-variance trade-off inherent in the flexibility of a statistical model.

  • Three types of error: bias, variance, and irreducible. If a model is too inflexible there will be more bias. If a model is too flexible there will be more variance. If bias decreases, variance increases. One has to try to balance the two types of error.

25
New cards

Training set of 1000 emails and a testing set of 100. Binary response spam/not spam. 600 labeled as spam in training, 55 labeled as spam in testing. What is the upper bound on the CER against a testing set for any new model to be useful? 40% 45% 55% 60%

  • 45% - null model accuracy

26
New cards

After training a multiple logistic model with 6 predictors we want to assess accuracy against a test set of 125 observations. Which is CER

  • Sum from 1 to 125 ( I(yi =/= yi-hat) /125 )

27
New cards

After estimating a simple logistic model you obtain a logarithmic function. For a given input the output of the function is ____.

  • P(y=1)

28
New cards

Explain how to commit CER of the given dataset

  • Decide on a threshold of success/failure, convert probabilities (p-hats) to y-hat values, compare to the true y response, determine what proportion of the time the model is wrong, and take the mean.

29
New cards

A linear regression model is considered simple if it has only one ______.

  • Predictor

30
New cards

Suppose yi is the true response and yi-hat is the predicted based on the model B0+ B1xi. Assuming there are n observations, the least squares algorithm seeks to minimize which quantity (what is RSS)?

  • Sum from i=1 to n ( (yi - yi-hat)^2 )

31
New cards

Estimate a simple linear regression. Training set with 250 observations and you do least squares method using the matrix notation below. What are the dimensions of B-hat?

  • 2 rows by 1 column (B0 and B1 column vector)

32
New cards

Describe the shape, direction, and strength of association between these two variables

  • Linear, negative/decreasing, strong/moderate

33
New cards

Fit a linear regression and test the hypothesis H0: B1=0, HA: B1=/=0 based on a significance level of alpha = 0.1, What is the interpretation of alpha?

  • We accept a 10% chance of finding an association between the predictor and response when in fact there is none

34
New cards

Inferential analysis of a simple linear regression mode we might constrict the confidence interval below B1 +- z_a/2 * SE_B1. In the formula z_a/2 represents the number of standard errors on the ______ distribution.

  • Standard normal

35
New cards

Null distribution with a test statistic inside the 0.05 significance level bounds. What is the appropriate conclusion?

  • No evidence to suggest that B1 =/= 0, fails to reject the null distribution, not beyond the rejection threshold

36
New cards

Given a sampling distribution for birth weight (x) and father’s age (y) with a 95% confidence interval

  • 95% confident that the true slope parameter is between [] and []. 95% confident that for every one-year increase in a father's age, the baby’s birth rate will increase by between [] and []. Say something about zero being in the interval, possible that there is no relationship.

37
New cards

Summary output with RSS 11.79. What does RSS mean

  • On average, the margins of victory in the training data differ from the model by about 11.79 points

38
New cards

The RSS measures error when using the fitted linear regression mode. The TSS measures the error when using the ______.

  • Mean response value (the null)

39
New cards

Explain how R^2 is related to the variance in the response variable

  • R^2 is the proportion of variance eliminated/explained in the fitted model as compared to the null model

40
New cards

Using 300 observations to train an MLR model to predict penguin body weight using 3 predictors. What are the values of n and p?

  • n=300, p=3

41
New cards

What formula would compute RSS? Given RSE is 11.79 and there are 999 observations and 4 predictors

  • RSE = sqrt(RSS/n-(1+p)), so RSS = 11.79^2*995

42
New cards

Which of the following is false regarding adjusted (multiple) R^2?

R^2 represents the proportion of variability in the response explained by its relationship with the predictors.

R^2 cannot be less than adjusted R^2.

R^2 will always decrease when we add predictors to the model.

  • R^2 will always decrease when we add predictors to the model - always increases

43
New cards

Estimate two different LR models using a training set with 400 observations. First has 3 predictors, second has 14. Given Adjusted R^2 = 1 - (RSS/TSS)((n-1)/(n-1-p))

  • TSS and number of observations are the same. p and RSS will be different.

44
New cards

Conduct an ANOVA with 3 predictor variables. One type of variance is MSG, the formula of which is (TSS-RSS)/p. The TSS measures the error in a model that employs ______ predictor variables

  • 0 (null model)

45
New cards

Using ANOVA, get the graph. F-statistic for the test statistic is to the left of the F-distribution for the null distribution (threshold). Meaning

  • Not rejecting the null, so none of the predictor variables are associated with the response

46
New cards

To explain the margin of victory perform ANOVA. What are the hypotheses associated?

  •  H_0: B1=B2=B3=0, H_A: at least one Bj=/=0

47
New cards

Explain the test statistic on the F distribution (F= MSG/MSE)

  • MSE is the amount of error in the fitted model (want it to be small). MSG is how much error is in the model relative to the null (want it to be big). If the ratio is big, the model is more accurate. The larger the ratio the more likely one of the predictors is useful.