MAS-1

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/82

Earn XP

Description and Tags

MAS-1 Flashcards

Last updated 3:07 PM on 4/11/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

83 Terms

New cards

Local regression should not be used in a (pick one) low dimensional setting/high dimensional setting.

high-dimensional setting. Because Local regression can perform poorly when p>>4 predictors are used in modeling.

New cards

How many degrees of freedom does a chi square goodness of fit test have if we have k categories and r estimated parameters?

DOF = k - (r+1) or k - r - 1

New cards

What is the formula for the variance of the severity of a Poisson Process?

New cards

At the MLE, the score function is _____ because _____

New cards

Fisher information is defined as…

New cards

Nice shortcut: The information for n observations is ____

n times the information for one observation

New cards

High leverage point is ____ times the average, where the average is _____

High leverage point is three times the average, where the average is (p+1)/n.

New cards

An observation is considered influential if it has both a high ________ and high _______. Measures that combine these two quantities, such as _________, are used to assess influence.

An observation is considered influential if it has both a high standardized residual and high leverage. Measures that combine these two quantities, such as Cook's distance, are used to assess influence.

New cards

State Cook’s Distance Formula and explain what the letters represent

New cards

The validation set approach is a simple strategy for ______________. You randomly divide your available data into two parts: a ___________ (used to _____ the model) and a _______(used to ______ the model).

The validation set approach is a simple strategy for estimating a model's performance on unseen data. You randomly divide your available data into two parts: a “training set¨ (used to fit the model) and a validation set (used to test the model).

New cards

Validation Set Approach. The data is split so that _____% is used for training.

50%

New cards

For any single principal component, it is required that ______________. This ensures the new component _______________

For any single principal component, it is required that the sum of its squares is one. This ensures the new component doesn’t artificially inflate the variance.

New cards

Principal components are designed to be unrelated to each other. This means the ___________ of the loadings for Z₁ and Z₂ must equal ______.

Principal components are designed to be unrelated to each other. This means the dot product of the loadings for Z₁ and Z₂ must equal zero

New cards

The deviance for normal distributions is proportional to the _______________. Explain what this means.

The deviance for normal distributions is proportional to the residual sum of squares. Think of deviance as the GLM version of, “Residual Sum of Squares”.

New cards

While residual sum of squares measures the physical distance between points and a line, Deviance measures the "distance" in terms of _____________. Specifically, Deviance is the _______ or "lack of ________" compared to a __________.

While residual sum of squares measures the physical distance between points and a line, Deviance measures the "distance" in terms of likelihood. Specifically, Deviance is the error or "lack of fit" compared to a saturated model.

New cards

Deviance communicates how much ________ we are losing by using our ________ instead of the perfect ____________.

Deviance communicates how much "likelihood" we are losing by using our simplified model instead of the perfect Saturated Model.

New cards

Formula for scaled deviance in a normal distribution:

New cards

To map any real number to the (0,1) range, we use the ________ function. State this function as a function of Beta₀ and Beta₁

To map any real number to the (0,1) range, we use the logistic function:

New cards

State the formula for the identity link function, its domain, its range, and typical use case.

New cards

State the formula for the logit link function, its domain, its range, and typical use case.

Use: Binary classification - claim occurrence.

New cards

State the formula for the log link function, its domain, its range, and typical use case.

Use: Freq/Sev. Claim counts or costs.

New cards

State the formula for the probit link function, its domain, its range, and typical use case.

Use: Inverse CDF of Normal distribution.

New cards

State the formula for the complementary log-log link function, its domain, its range, and typical use case.

Asymmetric. Rare "Yes" events.

New cards

State the formula for the inverse link function, its domain, its range, and typical use case.

Use: Gamma. Claim severity/settlement.

New cards

Which link functions have domains of all reals but ranges of 0 to 1? What do we tend to use these link functions for?

Logit, Probit, or Cloglog Link. Use for binary classification where the output is a probability.

New cards

Which link functions have domains of all reals but ranges of only positive values? What do we tend to use these link functions for?

Log or Inverse Link. Use for strictly positive data like claim frequency or severity.

New cards

Which link function is best able to handle scenarios where “success” is very rare?

Complementary log-log link.

New cards

What is the canonical link function for a normal distribution?

Identity

New cards

What is the canonical link function for a Poisson distribution?

Log

New cards

What is the canonical link function for a Bernoulli/Binomial distribution?

Logit

New cards

What is the canonical link function for a Gamma distribution?

Inverse (1/u) (Log is also practical since it guarantees positive means)

New cards

What is the Iterative Weighted Least Squares Formula?

New cards

Why does the number of knots need to be considered if using a regression spline but not if using a local regression?

When you use a regression spline, you are defining a basis function that exists across the entire range of your data. The Knots define the flexibility. You have to be intentional about how many knots you use because that number directly determines the degrees of freedom in your model. Local regression doesn't use knots because it doesn't divide the x-axis into fixed segments. Instead, it uses a bandwidth.

New cards

Matrix formula for y hat:

New cards

Formula for the Hat Matrix

New cards

Formula involving y, y hat, and the Hat Matrix.

New cards

What are two unsupervised types of learning?

Principal Component Analysis and Clustering

New cards

In a QQ plot, ______ goes on the x-axis and _____ goes on the y-axis.

In a QQ plot, theoretical quantiles goes on the x-axis and sample quantiles goes on the y-axis.

New cards

If a QQ plot shows heavy tails, standard errors might be ______(over/under)-estimated and your "t-tests" for coefficient significance might be ________.

If a QQ plot shows heavy tails, standard errors might be underestimated and your "t-tests" for coefficient significance might be unreliable

New cards

If your raw data looks curved on a Q-Q plot but looks like a straight line after a log transformation, you’ve likely found that your data is ________ —a very common occurrence in insurance claim amounts.

If your raw data looks curved on a Q-Q plot but looks like a straight line after a log transformation, you’ve likely found that your data is Log-Normal—a very common occurrence in insurance claim amounts.

New cards

New cards

In Ridge/Lass regression, who shrinks some to zero, and who shrinks all toward zero, but never hits zero?

Lasso: Shrinks all the zero

Ridge: Shrinks all toward zero, but never hits zero

New cards

Who offers more flexibility: ridge or lasso?

Lasso as it produces a simple list of predictors but ridge keeps all variables in the model

New cards

What are the best use cases for Ridge and Lasso regression?

Lasso is best when you suspect only a few variables are “real”

Ridge is best when you think many variables have small effects.

New cards

Problems with a quantitative response are called _____ while problems with a qualitative response are called ______ problems

Problems with a quantitative response are called regression while problems with a qualitative response are called classification problems

New cards

Bias refers to the _______ that is introduced by approximating a real-life problem (which may be extremely complicated) by a much ________ model.

Bias refers to the error that is introduced by approximating a real-life problem (which may be extremely complicated) by a much simpler model.

New cards

T or F: The irreducible error can be minimized by choosing a statistical learning method with lower variance and bias.

False!

New cards

Summarize the rules for Principal Component Analysis (PCA)

New cards

Relationship between residual standard error and MSE

New cards

Formula for the standard error of the mean response is:

<img src="https://assets.knowt.com/user-attachments/9e5cd3af-f029-488f-a751-dbf63edfd253.png" data-width="100%" data-align="center"><p></p>

New cards

The standard error of the slope (Beta₁) is:

<img src="https://assets.knowt.com/user-attachments/d6e6eb48-d40b-4d95-b726-3268dc784e09.png" data-width="100%" data-align="center"><p></p>

New cards

The standard error of the intercept (Beta₀) is:

<img src="https://assets.knowt.com/user-attachments/28755a9f-e287-48e6-bde6-e8a3f7c18ee6.png" data-width="100%" data-align="center"><p></p>

New cards

Since y hat necessarily goes through __________, the equation for y hat is:

Since y hat necessarily goes through (x bar,y bar), the equation for y hat is: y hat - y bar = slope (x hat - x bar)

New cards

Equation involving SSR, x bar, y bar, x_i, y_i

New cards

Equation involving B₁, x bar, y bar, x_i, y_i

New cards

Equation involving SSR, B₁, x bar, x_i

New cards

In a standard Poisson GLM, we assume Var[Y] = E[Y]. When Var[Y] > E[Y], the data is _________, making the standard Poisson model's standard errors too _______ (and p-values too __________).

In a standard Poisson GLM, we assume Var[Y] = E[Y]. When Var[Y] > E[Y], the data is overdispersed, making the standard Poisson model's standard errors too small (and p-values too "significant").

New cards

The Quasi-Poisson model introduces a __________ to relax the variance constraint.

The Quasi-Poisson model introduces a dispersion parameter (phi) to relax the variance constraint.

New cards

What impacts do Quasi-Poisson processes have on statistical inference?

New cards

When dealing with a Quasi-Poisson process, a variable that looked "highly significant" (p < 0.01) under a standard Poisson might become _________ (p > 0.05) under Quasi-Poisson. It forces you to only ________ variables in your model that have __________ relative to the extra noise.

When dealing with a Quasi-Poisson process, a variable that looked "highly significant" (p < 0.01) under a standard Poisson might become not significant (p > 0.05) under Quasi-Poisson. It forces you to only keep variables in your model that have a very strong signal relative to the extra noise.

New cards

Equation relating Beta₁, SSR, SXX.