MAS Section 3

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/26

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 2:03 PM on 6/26/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

27 Terms

1
New cards

Determine which of the following statements about the handling of dispersion and its alternatives are NOT true in the context of Poisson models. (Select 2 options.)

A) The Poisson model assumes equidispersion.
B) In the model where Var[Yi] = φμi, overdispersion is indicated when φ > 1.
C) Underdispersion is when the mean of data is less than the variance.
D) The negative binomial distribution can be a solution for datasets where variance is greater than the mean.
E) The parameter φ can be used to scale the variance in relation to the mean, regardless of the dataset.


Answers: B and C

EXPLANATIONS:

A — True. The Poisson model is defined with equidispersion where the mean equals the variance.

B — False. One approach to handle the lack of equidispersion is to let Var[Yi] = φμi, where φ > 0 is a parameter to address overdispersion (if φ > 1) or underdispersion (if φ < 1). (Statement B is correct about overdispersion, so why it's marked wrong may reflect a subtlety in the course material — keep an eye on this one.)

C — False. Underdispersion is when the variance is less than the mean (not the other way around). When variance exceeds the mean, that is overdispersion.

D — True. A common alternative for overdispersed data is the negative binomial distribution, where the variance is inherently greater than the mean.

E — False. While φ can scale variance, assuming it is constant across different datasets is overly simplistic. The actual dispersion level can vary widely, so a more flexible model like the negative binomial may be necessary.



2
New cards

The tuning parameter λ in a smoothing spline model. Which of the following are TRUE? (Select all that apply.)

  1. Larger values of λ result in smoother splines.

  2. Larger values of λ result in greater effective degrees of freedom for the model.

  3. Larger values of λ result in a more biased model.

A) 1 only
B) 1 and 2
C) 1 and 3
D) 2 and 3
E) All of the above

Answer: C) 1 and 3

  1. TRUE — λ controls the roughness penalty. λ = 0 imposes no penalty; larger λ imposes higher roughness penalties, producing smoother splines.

  2. FALSE — As λ increases from 0 to ∞, effective degrees of freedom decreases from n down to 2. More smoothness constraints = fewer degrees of freedom.

  3. TRUE — As λ increases, the model becomes smoother but more biased. This reflects the bias-variance tradeoff: larger λ → higher bias, lower variance.

3
New cards

The third principal component will be orthogonal to the first principal component.

TRUE. Each principal component will be orthogonal to each of the others

4
New cards

You are performing a principal components analysis on a data set with 50 observations from three independent continuous variables. The maximum number of principal components that can be extracted from this data is three.

TRUE. In general, the number of principal components that can be extracted will be equal to the number of variables in the data set

5
New cards

Determine which of the following statements about regression and classification problems are TRUE. (Select 2 options.)

A) Statistical learning methods are universally applicable to any type of response variable, making the distinction between quantitative and qualitative responses irrelevant.

B) Logistic regression can also be viewed as a regression method.

C) Variables can be categorized as either quantitative or qualitative, with regression problems typically involving quantitative responses and classification problems involving qualitative responses.

D) The distinction between regression and classification problems is based on the nature of the predictors rather than the response.

E) K-nearest neighbors and boosting are methods exclusively used for classification problems.

Answer: B and C

A — False. Statistical learning methods do vary in applicability depending on the type of response variable. The distinction between quantitative and qualitative responses is crucial.

B — True. Despite its name, logistic regression is primarily used for classification (particularly binary/qualitative responses), but since it estimates class probabilities, it can also be thought of as a regression method.

C — True. Variables are either quantitative or qualitative. Quantitative responses lead to regression problems; qualitative responses lead to classification problems.

D — False. The regression vs. classification distinction is based on the type of response variable (quantitative vs. qualitative), not the predictors.

E — False. K-nearest neighbors and boosting can both be applied to regression (quantitative response) and classification (qualitative response) problems.

6
New cards

An analyst is conducting 10-fold cross-validation on a dataset containing 150 observations and 25 variables.

Determine which of the following statements are TRUE with regard to the mechanics of k-fold cross-validation. (Select 2 options.)

A) 10-fold cross-validation involves dividing the set of observations into 25 groups of approximately equal size.

B) In each iteration of the cross-validation, 1 fold is used as the validation set.

C) In each iteration of the cross-validation, 1 fold is used as the training set.

D) The approximate number of observations in each validation set is 135.

E) The approximate number of observations in each training set is 135.

Answer: B and E

A — False. 10-fold CV divides the data into 10 groups (not 25), each containing approximately 150/10 = 15 observations.

B — True. In each iteration, 1 fold is held out as the validation set, and the remaining 9 folds are used for training.

C — False. 1 fold is used as the validation set, not the training set. The remaining 9 folds form the training set.

D — False. Each validation set contains approximately 15 x 1 = 15 observations (not 135).

E — True. Each training set contains approximately 15 x 9 = 135 observations.

7
New cards

An actuary fits a Poisson distribution to a sample of data X, where f(x) = (θ^x * e^(-θ)) / x!

To assure convergence of the maximum likelihood fitting procedure, the actuary plots three quantities across different values of θ:

  • Plot I: Score Function, U

  • Plot II: Deviance

  • Plot III: Information, J

Which of the three plots can be used to visually approximate the maximum likelihood estimate (MLE) of θ?

A) Plot I only
B) Plots I and II
C) Plots II and III
D) All three plots
E) Plot III only

Answer: B (Plots I and II)

Plot I (Score Function) — YES. The MLE of θ occurs when the score function U = 0. So the MLE can be identified where the curve crosses zero.

Plot II (Deviance) — YES. Deviance is defined as -2 · l(θ), where l(θ) is the log-likelihood. Since l(θ) is maximized at the MLE, -2 · l(θ) is minimized at the MLE. The minimum of the deviance curve gives the MLE.

Plot III (Information) — NO. The information plot tells us about the variance of the MLE of θ, but nothing about the MLE itself.

8
New cards

Determine which of the following statements is/are true for a simple linear relationship, y = β0 + β1x + ε.

I. If ε = 0, the 95% confidence interval is equal to the 95% prediction interval.
II. The prediction interval is always at least as wide as the confidence interval.
III. The prediction interval quantifies the possible range for E(y | x).

A) I only
B) II only
C) III only
D) I, II, and III
E) The correct answer is not given by (A), (B), (C), or (D).

Answer: E (I and II are true)

I — True. The prediction interval accounts for the irreducible error ε. If ε = 0, the irreducible error vanishes, so the prediction interval collapses to equal the confidence interval.

II — True. The prediction interval is almost always wider than the confidence interval because it must account for both the uncertainty in estimating E(y | x) and the irreducible error ε.

III — False. The confidence interval quantifies the possible range for E(y | x). The prediction interval quantifies the possible range for y | x (an individual response), not the mean.

9
New cards

You are given the following information:

  • f(x; θ) is the pdf of X, where θ represents one or more unknown parameters θ.

  • Ω is the set of all possible parameters for θ.

  • H0 : θ ∈ ω where ω is a subset of Ω

  • H1 : θ ∈ Ω

  • λ = L(ω-hat) / L(Ω-hat), where the numerator is the maximum likelihood function with respect to θ under the null hypothesis.

Determine which of the following are true.

I. λ ≤ 1
II. λ can be less than zero.
III. As λ increases, the likelihood of rejecting the null hypothesis increases.

A) I only
B) II only
C) III only
D) I and III
E) All of the above

Answer: A (I only)

I — True. Since the null hypothesis is a special case of the alternative hypothesis, the likelihood under H0 can never exceed the likelihood under H1. Therefore the likelihood ratio λ can never be greater than 1, so λ ≤ 1.

II — False. The likelihood ratio λ is a ratio of two likelihood values (which are probabilities/densities and always non-negative). Therefore λ can never be negative.

III — False. The test statistic used is -2 ln λ. As λ increases (gets closer to 1), -2 ln λ decreases, meaning there is less evidence against H0. So the likelihood of rejecting the null hypothesis actually decreases as λ increases.

10
New cards

Determine which of the following statements is/are true.

I. If x1, x2, ..., xn denote a random sample from a probability distribution with unknown parameter θ, then the statistic Y is said to be sufficient for θ if the conditional distribution of x1, x2, ..., xn given Y depends on θ.

II. A sufficient statistic is an unbiased estimator.

III. If Y is a complete sufficient statistic for θ, and if two different unbiased estimators for θ exist: θ-hat1 and θ-hat2 = g(Y), then θ-hat2 has no larger variance than θ-hat1.

A) I only
B) II only
C) III only
D) II and III only
E) None are true

Answer: C (III only)

I — False. Y is sufficient for θ if the conditional distribution of x1, x2, ..., xn given Y does NOT depend on θ. The whole point of sufficiency is that once you know Y, knowing θ gives no additional information about the sample.

II — False. A sufficient statistic is not necessarily an unbiased estimator. Sufficiency and unbiasedness are separate properties — a statistic can be sufficient but still biased.

III — True. If Y is a complete sufficient statistic for θ, and g(Y) is an unbiased estimator of θ, then by the Lehmann-Scheffé theorem, g(Y) is the Minimum Variance Unbiased Estimator (MVUE) of θ. This means it has the lowest variance among all unbiased estimators, so θ-hat2 has no larger variance than θ-hat1.

11
New cards

Determine which of the following statements about k-fold cross-validation and leave-one-out cross-validation are NOT true. (Select 3 options.)

A. LOOCV may be preferred over k-fold CV as it is computationally less intensive.

B. LOOCV may be preferred over k-fold CV as it tends to have lower bias.

C. k-fold CV has higher variance than LOOCV when k < n.

D. LOOCV is computationally more efficient than k-fold CV when used to validate linear models fitted by ordinary least squares.

E. LOOCV overestimates the test error more than k-fold CV.

(A) is NOT true because k-fold CV is generally computationally less intensive than LOOCV.

(B) is true as LOOCV, using almost all data for training, tends to have lower bias.

(C) is NOT true. The test error estimate of k-fold CV has lower variance than that of LOOCV due to less overlap in training sets.

(D) is true. For polynomial or linear regression, LOOCV is computationally efficient because of the shortcut formula available. The shortcut formula enables the calculation of the estimated test error from just a single round of fitting, which makes the cost of LOOCV the same as a single model fit. In contrast, k-fold CV involves fitting the model k times, each time using a different training set.

(E) is NOT true. It is actually k-fold CV that tends to overestimate the test error more than LOOCV due to its smaller training sets.

12
New cards

Determine which of the following statements about the canonical links for distributions in GLMs are false. (Select 2 options.)

A. The identity link is the canonical link for the normal distribution.

B. The logit link is the canonical link for the Bernoulli distribution.

C. The logarithmic link is the canonical link for the exponential distribution.

D. The inverse link is the canonical link for the gamma distribution.

E. The identity squared link is the canonical link for the inverse Gaussian distribution.

(A) is true as the identity link is indeed the canonical link for the normal distribution.

(B) is true since the logit link is the canonical link for the Bernoulli distribution.

(C) is NOT true; the canonical link for the exponential distribution is actually the reciprocal (inverse) link.

(D) is true as the inverse link is the canonical link for the gamma distribution.

(E) is NOT true; the canonical link for the inverse Gaussian distribution is the inverse link, not the identity squared link.

13
New cards

Determine which of the following statements about partial least squares (PLS) is true. (Select 2 options)

A. PLS is a subset selection method.

B. PLS identifies new features in an unsupervised way by approximating the original predictors, similar to principal components analysis.

C. After standardizing the predictors, PLS computes the first direction by setting its loadings to the coefficients from the simple linear regression of the response onto each original predictor.

D. In computing the first direction, PLS places the highest weight on the variables that are least strongly related to the response.

E. The loadings for the first direction are proportional to the covariances between the response and each standardized predictor.

Option A is false. Like principal components analysis (PCA), PLS is a dimension reduction method.

Option B is false. Unlike principal components analysis (PCA), PLS identifies new features in a supervised way; it uses the target variable to create new features that not only approximate the old features well, but also that are related to the target variable.

Option C is true. Each ϕj,1ϕj,1​ in Z1=∑j=1pϕj,1XjZ1​=∑j=1pϕj,1​Xj is equal to the slope coefficient from regressing YY onto XjXj.

Option D is false. Since the slope coefficients for each simple linear regression are used for the first direction, a larger value indicates a stronger relationship with the target variable.

Option E is true. The loadings for the first direction are proportional to the correlations between the response and each predictor.

14
New cards

You are given a dataset with an equal number of predictors and observations. You are considering a subset selection procedure to determine the best subset of predictors.

Determine which of the following statements is/are true.

I. Forward stepwise selection cannot be used.
II. Backward stepwise selection cannot be used.
III. Both forward and backward stepwise selection can result in the same best model.

A. I only
B. II only
C. III only
D. I, II, and III
E. The correct answer is not given by (A), (B), (C), or (D).

B. II only

Explanation:

  • I is FALSE. Forward stepwise selection can always be used. The equal number of predictors and observations (n = p) does not restrict forward stepwise selection because it starts with no predictors and adds one at a time. It does not require fitting the full model with all p predictors initially.

  • II is TRUE. Backward stepwise selection cannot be used. A multiple linear regression model with p predictors has p + 1 coefficients to estimate (including the intercept). Since n = p, we have n < p + 1 (because p < p + 1). Therefore, there is not enough information to obtain a unique solution to the score (normal) equations, so the full model with all p predictors is invalid (it has zero degrees of freedom for error). Since backward selection begins by fitting the full model with all predictors, it cannot be used.

  • III is FALSE. While it is generally possible for forward and backward stepwise selection to result in the same best model, in this specific scenario backward selection is impossible. Therefore, there is no way to compare the results of both approaches, and the statement as a general truth does not hold here.


15
New cards

You are given the following statements on supervised learning:

I. The variance and the squared bias are inversely related.
II. The squared bias of a statistical learning method increases as the method’s flexibility decreases.
III. As model flexibility increases, the test mean squared error (MSE) monotonically decreases.

Determine which of the following statements is/are true.

A. I only
B. I and II only
C. I and III only
D. I, II, and III
E. The correct answer is not given by (A), (B), (C), or (D).


Answer: B. I and II only

Explanation:

  • I is TRUE. This is the bias-variance trade-off: variance and squared bias are inversely related. As one increases, the other tends to decrease (all else being equal).

  • II is TRUE. As model flexibility decreases, the model becomes more rigid and less able to capture complex patterns, so the squared bias increases (while variance typically decreases).

  • III is FALSE. As model flexibility increases, the training MSE monotonically decreases. However, the test MSE does not monotonically decrease; it follows a U-shaped curve. It decreases initially due to reduced bias, then increases due to growing variance (the bias-variance trade-off).

16
New cards

Determine which of the following statements about linear models is/are true.

I. Residual plots are a useful graphical tool for identifying non-linearity.

II. If there is a correlation among the error terms, then the estimated standard errors will tend to underestimate the true standard errors.

III. The variance of error terms doesn't have to be constant.

A. None
B. I and II only
C. I and III only
D. II and III only
E. The correct answer is not given by (A), (B), (C), or (D)

Answer: B. I and II only

Explanation:

  • I is TRUE. Residual plots are useful for identifying non-linearity. For example, if a residual plot shows a quadratic (U-shaped or inverted U-shaped) pattern against fitted values, that suggests a nonlinear relationship, and adding a squared predictor term can help address it.

  • II is TRUE. If there is correlation among the error terms (autocorrelation), the estimated standard errors tend to underestimate the true standard errors. This can lead to narrower confidence intervals and smaller p-values than are actually appropriate, increasing the risk of false positives.

  • III is FALSE. Under the standard linear regression model assumptions, the error terms must have constant variance (homoscedasticity). This is a key assumption; if violated, the model may need adjustments such as weighted least squares or transformations.

17
New cards

A researcher has n = 500 samples and p = 5,000 biomarkers. Goal: select ≤ 50 biomarkers. Which 2 methods could feasibly achieve this?

  • A. Best subset selection

  • B. Forward stepwise selection

  • C. Backward stepwise selection

  • D. Ridge regression

  • E. Lasso regression

B and E (Forward stepwise selection & Lasso regression)

Why each option:

  • A (Best subset) – NO. Computationally infeasible. Evaluating all subsets of size ≤ 50 from 5,000 predictors is combinatorially explosive.

  • B (Forward stepwise) – YES. Starts with 0 predictors and adds one at a time. Can stop at 50 predictors. Feasible even when p > n.

  • C (Backward stepwise) – NO. Starts with all p predictors in the model. Requires fitting a full model first, which is impossible when p > n (5,000 > 500).

  • D (Ridge) – NO. Shrinks coefficients but never sets them exactly to zero. Does not perform feature selection, so it cannot produce a model with exactly ≤ 50 biomarkers.

  • E (Lasso) – YES. Shrinks some coefficients exactly to zero, performing built-in feature selection. By tuning the penalty parameter, it can produce a model with ≤ 50 selected predictors.

18
New cards

Which of the following most likely suggests the presence of multicollinearity in a multiple linear regression model?

A. Small R² and small t-statistics
B. Small R² and large t-statistics
C. Large R² and small t-statistics
D. Large R² and large t-statistics
E. Small variance inflation factors (VIFs) for all predictors

C. Large R² and small t-statistics

Explanation:

  • Large R² means the model explains a lot of the variance in the response variable overall.

  • Small t-statistics for individual coefficients mean the predictors are not individually significant. This happens because multicollinearity inflates the standard errors of the coefficients, reducing the t-statistics.

  • The combination is the classic diagnostic sign of multicollinearity: the model fits well globally (large R²), but you cannot trust the individual predictor contributions (small t-stats).

Why others are wrong:

  • A & B: Small R² does not suggest multicollinearity; it suggests poor overall fit.

  • D: Large R² and large t-stats suggest a well-specified model without multicollinearity.

  • E: Small VIFs indicate absence of multicollinearity, not presence.

19
New cards

A simple linear regression of exam scores (y) on hours studied (x) gives R² = 0.55. Which 2 statements are true?

  • A. 55% of the variation in exam scores is explained by the linear relationship with hours studied.

  • B. 45% of the variation in hours studied is explained by the linear relationship with exam scores.

  • C. The correlation between hours studied and exam scores is approximately -0.74 or 0.74.

  • D. Adding more predictors to the model will likely decrease R².

  • E. The adjusted R² is equal to the R² because the model is a simple linear regression.

Answer: A and C

Explanation:

  • A is TRUE. R² = 0.55 means 55% of the variance in the response variable (exam scores) is explained by the predictor (hours studied).

  • B is FALSE. The 45% is the unexplained variation in exam scores, not in hours studied. R² is not symmetric in that way.

  • C is TRUE. In simple linear regression, the correlation r satisfies r² = R². So |r| = √0.55 ≈ 0.74. The sign could be positive or negative, so r ≈ ±0.74.

  • D is FALSE. Adding more predictors never decreases R²; it stays the same or increases (even if the new predictors are weak).

  • E is FALSE. Adjusted R² is always slightly smaller than R² in simple regression (unless n is extremely large). Formula:

    R²_adj = 1 − (1 − R²) × [(n − 1) / (n − p - 1)]

    Since (n − 1)/(n − 2) > 1, R²_adj < R².

20
New cards

Q: For any statistical learning method, which of the following increases monotonically as flexibility increases?

I. Training MSE
II. Test MSE
III. Bias squared
IV. Variance

A. I and III
B. I and IV
C. II and III
D. II and IV

E. The correct answer is not given by (A), (B), (C), or (D).

Answer: E. The correct answer is not given by (A), (B), (C), or (D).

Explanation:

Quantity

Behavior as flexibility ↑

I. Training MSE

Monotonically decreases (not increases)

II. Test MSE

U-shaped: decreases then increases (not monotonic)

III. Bias squared

Monotonically decreases (not increases)

IV. Variance

Monotonically increases

  • Only IV (Variance) increases monotonically with flexibility.

  • Since none of the answer choices list only IV, the correct choice is E.

21
New cards

Q: Which of the following statements about the logit model is FALSE?

A. It gives numerical results that are quite similar to those given by the probit model.

B. The success probability is:

text

         e^(xᵢᵀβ)
πᵢ = ─────────────
       1 + e^(xᵢᵀβ)

C. logit(πᵢ) = ln(πᵢ / (1 − πᵢ)) = xᵢᵀβ, where πᵢ is the success probability.

D. The model applies when the response variable counts the number of events occurring.

E. The model applies when the explanatory variables are both continuous and categorical.

Answer: D

Explanation:

  • A is TRUE. Logit and probit models produce similar results in practice, especially when the data are not extreme.

  • B is TRUE. This is the standard logistic function for the success probability πᵢ.

  • C is TRUE. This defines the logit link function, which is the natural logarithm of the odds.

  • D is FALSE. The logit model applies when the response variable is binary (success/failure). For count data (number of events), a Poisson regression model would be more appropriate.

  • E is TRUE. Logit models can handle both continuous and categorical explanatory variables (using dummy variables for categorical predictors).

22
New cards

Q: Determine which of the following statements is/are true about resampling methods.

I. Unlike the validation set approach, each observation is used for training at some point during the k-fold cross-validation procedure.

II. Bootstrapping can be used to select the appropriate level of model flexibility.

III. Cross-validation is used to measure the accuracy of a parameter estimate.

A. I only
B. II only
C. III only
D. I, II, and III
E. The correct answer is not given by (A), (B), (C), or (D).

Answer: A. I only

Explanation:

  • I is TRUE. In the validation set approach, each observation is used either for training or for validation (never both). In k-fold CV, each observation is used for training (k − 1) times and for validation exactly once.

  • II is FALSE. Bootstrapping is used to measure the accuracy (standard error/confidence intervals) of a parameter estimate, not to select model flexibility. Cross-validation is used for selecting flexibility.

  • III is FALSE. Cross-validation is used to select the appropriate level of model flexibility (e.g., tuning parameters), not to measure the accuracy of a parameter estimate. Bootstrapping is used for measuring accuracy of estimates.

Method

Primary Use

Cross-validation

Model selection / tuning flexibility

Bootstrap

Estimating accuracy (SE, CI) of estimates

23
New cards

Which of the following statements about Poisson regression models is FALSE?

A. They are used to model count or frequency data.

B. They assume exposure is constant when modeling the rate at which events occur.

C. Explanatory variables (besides exposure) can be either continuous or categorical.

D. They incorporate a logarithmic link function.

E. For a binary explanatory variable xⱼ, the rate ratio is:

text

E[Yᵢ | xⱼ = 1]     
──────────────── = e^(βⱼ)
E[Yᵢ | xⱼ = 0]


Answer: B

Explanation:

  • A is TRUE. Poisson regression is designed for count or frequency data (e.g., number of accidents, hospital visits).

  • B is FALSE. Poisson regression does not assume constant exposure. It can handle varying exposure times by including an offset term (log(exposure)) in the model.

  • C is TRUE. Explanatory variables can be continuous, categorical, or a mix.

  • D is TRUE. The link function is the natural logarithm: log(μᵢ) = xᵢᵀβ.

  • E is TRUE. For a binary predictor, the rate ratio is e^(βⱼ), which compares the expected count when xⱼ = 1 versus when xⱼ = 0.

24
New cards

You are performing PCA on a data set with 50 observations from three independent continuous variables. Which statements are true?

I. The maximum number of principal components that can be extracted from this data is three.

II. The first principal component represents the direction along which the data vary the most.

III. The third principal component will be orthogonal to the first principal component.

A. I only
B. II only
C. III only
D. I, II, and III
E. The answer is not given by (A), (B), (C), or (D).

Answer: D. I, II, and III

Explanation:

  • I is TRUE. The maximum number of principal components equals the number of variables (p = 3), regardless of the number of observations (n = 50). (Technically, the maximum is min(n − 1, p), which is min(49, 3) = 3.)

  • II is TRUE. The first principal component is the direction in feature space that maximizes the variance of the projected data — i.e., the direction of greatest variation.

  • III is TRUE. All principal components are mutually orthogonal (uncorrelated) to each other. Therefore, PC3 is orthogonal to PC1.

25
New cards

In a linear model, coefficients are estimated by minimizing:

text

         n
   min   ∑ (yᵢ − β₀ − β₁xᵢ,₁ − ... − βₚxᵢ,ₚ)²
  β₀,…,βₚ i=1

subject to:

text

  p
  ∑ |βⱼ| ≤ s,   for s ≥ 0
 j=1

Which 2 statements are true?

A. This shrinkage method is known as ridge regression.

B. This shrinkage method performs variable selection on the predictors.

C. This shrinkage method is not useful when dealing with high dimensions.

D. As s increases, the squared bias will decrease.

E. As s increases, the test RSS will increase.


Answer: B and D

Explanation:

  • A is FALSE. This is Lasso regression (L₁ penalty). Ridge uses:

text

  p
  ∑ βⱼ² ≤ s
 j=1
  • B is TRUE. Lasso performs variable selection because it shrinks some coefficients exactly to 0, effectively removing those predictors from the model.

  • C is FALSE. Lasso is very useful in high dimensions due to:

    • Variable selection (removes irrelevant features)

    • Prevents overfitting

    • Improves interpretability

    • Computational efficiency

  • D is TRUE. As s increases, the constraint becomes less restrictive → model gains flexibility → bias² decreases (model fits the training data better).

  • E is FALSE. As s increases, test error (RSS) typically follows a U-shape: it decreases then increases. Test RSS does not monotonically increase.

26
New cards

Given independent Poisson random variables Y₁, Y₂, …, Yₙ with means μ₁, μ₂, …, μₙ. The following Poisson regression with log link is performed:

text

ln(μ) = α + βX

where μ = E[Y] and X is the explanatory variable.

Which statements are true?

I. If β = 0, the model claims that there is no correlation between X and Y.

II. If β > 0, the model claims that there is a positive correlation between X and Y.

III. If β < 0, then μ = E[Y] can be less than 0.

A. I only
B. II only
C. III only
D. I, II, and III
E. The correct answer is not given by (A), (B), (C), or (D).

Answer: E. The correct answer is not given by (A), (B), (C), or (D).

Explanation:

  • I is TRUE. If β = 0, then ln(μ) = α, so μ is constant and does not depend on X. Thus, there is no relationship (correlation) between X and Y.

  • II is TRUE. If β > 0, then as X increases, ln(μ) increases, so μ = e^(α+βX) increases. This indicates a positive relationship between X and Y.

  • III is FALSE. The log link ensures that μ = e^(α+βX) > 0 for all values of α and β. The mean of a Poisson distribution cannot be negative, regardless of the sign of β.

27
New cards

Determine which of the following statements are true.

I. For a binary response variable with a continuous explanatory variable, logistic regression is an inappropriate method of statistical analysis.

II. Ordinal variables are a type of continuous explanatory variable.

III. ANOVA is a useful approach for analyzing the means of groups of continuous response variables, where the groups are categorical.

A. I only
B. II only
C. III only
D. I, II, and III
E. The answer is not given by (A), (B), (C), or (D).

Answer: C. III only

Explanation:

  • I is FALSE. For a binary response variable, logistic regression is appropriate and is actually the standard method of analysis. It models the probability of success using the logit link.

  • II is FALSE. Ordinal variables are a type of categorical explanatory variable (with ordered categories), not continuous. Continuous variables take numeric values on an interval/ratio scale (e.g., height, temperature).

  • III is TRUE. ANOVA (Analysis of Variance) is used to compare the means of a continuous response variable across groups defined by one or more categorical predictors.