MAS-1

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/82

flashcard set

Earn XP

Description and Tags

MAS-1 Flashcards

Last updated 3:07 PM on 4/11/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

83 Terms

1
New cards

Local regression should not be used in a (pick one) low dimensional setting/high dimensional setting.

high-dimensional setting. Because Local regression can perform poorly when  p>>4 predictors are used in modeling.

2
New cards

How many degrees of freedom does a chi square goodness of fit test have if we have k categories and r estimated parameters?

DOF = k - (r+1) or k - r - 1

3
New cards

What is the formula for the variance of the severity of a Poisson Process?

4
New cards

At the MLE, the score function is _____ because _____

5
New cards

Fisher information is defined as…

6
New cards

Nice shortcut: The information for n observations is ____

n times the information for one observation

7
New cards

High leverage point is ____ times the average, where the average is _____

High leverage point is three times the average, where the average is (p+1)/n. 

8
New cards

An observation is considered influential if it has both a high ________ and high _______. Measures that combine these two quantities, such as _________, are used to assess influence.

An observation is considered influential if it has both a high standardized residual and high leverage. Measures that combine these two quantities, such as Cook's distance, are used to assess influence.

9
New cards

State Cook’s Distance Formula and explain what the letters represent

10
New cards

The validation set approach is a simple strategy for ______________. You randomly divide your available data into two parts: a ___________ (used to _____ the model) and a _______(used to ______ the model).

The validation set approach is a simple strategy for estimating a model's performance on unseen data. You randomly divide your available data into two parts: a “training set¨ (used to fit the model) and a validation set (used to test the model).

11
New cards

Validation Set Approach. The data is split so that _____% is used for training.

50%

12
New cards

For any single principal component, it is required that ______________. This ensures the new component _______________

For any single principal component, it is required that the sum of its squares is one. This ensures the new component doesn’t artificially inflate the variance.

13
New cards

Principal components are designed to be unrelated to each other. This means the ___________ of the loadings for Z1 and Z2 must equal ______.

Principal components are designed to be unrelated to each other. This means the dot product of the loadings for Z1 and Z2 must equal zero

14
New cards

The deviance for normal distributions is proportional to the _______________. Explain what this means.

The deviance for normal distributions is proportional to the residual sum of squares. Think of deviance as the GLM version of, “Residual Sum of Squares”.

15
New cards

While residual sum of squares measures the physical distance between points and a line, Deviance measures the "distance" in terms of _____________. Specifically, Deviance is the _______ or "lack of ________" compared to a __________.

While residual sum of squares measures the physical distance between points and a line, Deviance measures the "distance" in terms of likelihood. Specifically, Deviance is the error or "lack of fit" compared to a saturated model.

16
New cards

Deviance communicates how much ________ we are losing by using our ________ instead of the perfect ____________.

Deviance communicates how much "likelihood" we are losing by using our simplified model instead of the perfect Saturated Model.

17
New cards

Formula for scaled deviance in a normal distribution:

18
New cards

To map any real number to the (0,1) range, we use the ________ function. State this function as a function of Beta0 and Beta1

To map any real number to the (0,1) range, we use the logistic function:

19
New cards

State the formula for the identity link function, its domain, its range, and typical use case.

20
New cards

State the formula for the logit link function, its domain, its range, and typical use case.

Use: Binary classification - claim occurrence.

21
New cards

State the formula for the log link function, its domain, its range, and typical use case.

Use: Freq/Sev. Claim counts or costs.

22
New cards

State the formula for the probit link function, its domain, its range, and typical use case.

Use: Inverse CDF of Normal distribution.

23
New cards

State the formula for the complementary log-log link function, its domain, its range, and typical use case.

Asymmetric. Rare "Yes" events.

24
New cards

State the formula for the inverse link function, its domain, its range, and typical use case.

Use: Gamma. Claim severity/settlement.

25
New cards

Which link functions have domains of all reals but ranges of 0 to 1? What do we tend to use these link functions for?

Logit, Probit, or Cloglog Link. Use for binary classification where the output is a probability.

26
New cards

Which link functions have domains of all reals but ranges of only positive values? What do we tend to use these link functions for?

Log or Inverse Link. Use for strictly positive data like claim frequency or severity.

27
New cards

Which link function is best able to handle scenarios where “success” is very rare?

Complementary log-log link.

28
New cards

What is the canonical link function for a normal distribution?

Identity

29
New cards

What is the canonical link function for a Poisson distribution?

Log

30
New cards

What is the canonical link function for a Bernoulli/Binomial distribution?

Logit

31
New cards

What is the canonical link function for a Gamma distribution?

Inverse (1/u) (Log is also practical since it guarantees positive means)

32
New cards

What is the Iterative Weighted Least Squares Formula?

33
New cards

Why does the number of knots need to be considered if using a regression spline but not if using a local regression?

When you use a regression spline, you are defining a basis function that exists across the entire range of your data. The Knots define the flexibility. You have to be intentional about how many knots you use because that number directly determines the degrees of freedom in your model. Local regression doesn't use knots because it doesn't divide the x-axis into fixed segments. Instead, it uses a bandwidth.

34
New cards

Matrix formula for y hat:

35
New cards

Formula for the Hat Matrix

36
New cards

Formula involving y, y hat, and the Hat Matrix.

37
New cards

What are two unsupervised types of learning?

Principal Component Analysis and Clustering

38
New cards

In a QQ plot, ______ goes on the x-axis and _____ goes on the y-axis.

In a QQ plot, theoretical quantiles goes on the x-axis and sample quantiles goes on the y-axis.

39
New cards

If a QQ plot shows heavy tails, standard errors might be ______(over/under)-estimated and your "t-tests" for coefficient significance might be ________.

If a QQ plot shows heavy tails, standard errors might be underestimated and your "t-tests" for coefficient significance might be unreliable

40
New cards

If your raw data looks curved on a Q-Q plot but looks like a straight line after a log transformation, you’ve likely found that your data is ________ —a very common occurrence in insurance claim amounts.

If your raw data looks curved on a Q-Q plot but looks like a straight line after a log transformation, you’ve likely found that your data is Log-Normal—a very common occurrence in insurance claim amounts.

41
New cards
term image
knowt flashcard image
42
New cards

In Ridge/Lass regression, who shrinks some to zero, and who shrinks all toward zero, but never hits zero?

Lasso: Shrinks all the zero

Ridge: Shrinks all toward zero, but never hits zero

43
New cards

Who offers more flexibility: ridge or lasso?

Lasso as it produces a simple list of predictors but ridge keeps all variables in the model

44
New cards

What are the best use cases for Ridge and Lasso regression?

Lasso is best when you suspect only a few variables are “real”

Ridge is best when you think many variables have small effects.

45
New cards

Problems with a quantitative response are called _____ while problems with a qualitative response are called ______ problems

Problems with a quantitative response are called regression while problems with a qualitative response are called classification problems

46
New cards

Bias refers to the _______ that is introduced by approximating a real-life problem (which may be extremely complicated) by a much ________ model.

Bias refers to the error that is introduced by approximating a real-life problem (which may be extremely complicated) by a much simpler model.

47
New cards

T or F: The irreducible error can be minimized by choosing a statistical learning method with lower variance and bias.

False!

48
New cards

Summarize the rules for Principal Component Analysis (PCA)

knowt flashcard image
49
New cards

Relationship between residual standard error and MSE

50
New cards

Formula for the standard error of the mean response is:

<img src="https://assets.knowt.com/user-attachments/9e5cd3af-f029-488f-a751-dbf63edfd253.png" data-width="100%" data-align="center"><p></p>
51
New cards

The standard error of the slope (Beta1) is:

<img src="https://assets.knowt.com/user-attachments/d6e6eb48-d40b-4d95-b726-3268dc784e09.png" data-width="100%" data-align="center"><p></p>
52
New cards

The standard error of the intercept (Beta0) is:

<img src="https://assets.knowt.com/user-attachments/28755a9f-e287-48e6-bde6-e8a3f7c18ee6.png" data-width="100%" data-align="center"><p></p>
53
New cards

Since y hat necessarily goes through __________, the equation for y hat is:

Since y hat necessarily goes through (x bar,y bar), the equation for y hat is: y hat - y bar = slope (x hat - x bar)

54
New cards

Equation involving SSR, x bar, y bar, xi, yi

55
New cards

Equation involving B1, x bar, y bar, xi, yi

56
New cards

Equation involving SSR, B1, x bar, xi

57
New cards

In a standard Poisson GLM, we assume Var[Y] = E[Y]. When Var[Y] > E[Y], the data is _________, making the standard Poisson model's standard errors too _______ (and p-values too __________).

In a standard Poisson GLM, we assume Var[Y] = E[Y]. When Var[Y] > E[Y], the data is overdispersed, making the standard Poisson model's standard errors too small (and p-values too "significant").

58
New cards

The Quasi-Poisson model introduces a __________ to relax the variance constraint.

The Quasi-Poisson model introduces a dispersion parameter (phi) to relax the variance constraint.

59
New cards

What impacts do Quasi-Poisson processes have on statistical inference?

60
New cards

When dealing with a Quasi-Poisson process, a variable that looked "highly significant" (p < 0.01) under a standard Poisson might become _________ (p > 0.05) under Quasi-Poisson. It forces you to only ________ variables in your model that have __________ relative to the extra noise.

When dealing with a Quasi-Poisson process, a variable that looked "highly significant" (p < 0.01) under a standard Poisson might become not significant (p > 0.05) under Quasi-Poisson. It forces you to only keep variables in your model that have a very strong signal relative to the extra noise.

61
New cards

Equation relating Beta1, SSR, SXX.

62
New cards

What is SXX?

Sum of squares for X.

63
New cards

Relationship between SXX and sample variance

64
New cards

SYY

Sum of Squares for Y. (SST)

65
New cards

Residual = ? - ?

Actual minus expected

66
New cards
67
New cards

Formula for the standard error of a new observation is:

68
New cards

What is the difference between Standard Deviation and Standard Error?

Standard Deviation: Describes the spread of individual data points in a single sample or population.

Standard Error: Describes the spread (uncertainty) of an estimate if you were to repeat the experiment many times.

69
New cards

70
New cards

71
New cards

How do you find the standard error of an estimate of the population variance?

72
New cards

How is Beta1 hat related to R2?

73
New cards

Are Ordinary Least Squares estimates biased or unbiased?

Unbiased

74
New cards

Formula for correlation coefficient, r:

75
New cards

Formula involving SSR, xi, x bar, yi, y bar

76
New cards

Relationship between r, beta1, sx, sy

77
New cards

Relationship between r, SSR, SST

78
New cards

Relationship between correlation, covariance, sx,sy

79
New cards

How are sample variances, covariances related to Beta_1, the slope of the regression line?

80
New cards

Formula for sample covariance

81
New cards

Shortcut for calculating sample covariance

82
New cards

Relationship between correlation coefficient, sample covariance, sample standard devations

83
New cards

Shortcut for variance of Beta1 hat using R2 and n