Refined Exam PA Flashcards

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/224

Earn XP

Description and Tags

A honed in version on pieces that need review based on my studies

Last updated 6:02 AM on 3/28/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

225 Terms

New cards

What is descriptive modeling?

Focuses on what happened in the past and aims to “describe” or explain the observed patterns by identifying the relationships between different variables in the data.

New cards

What is predictive modeling?

Focuses on what will happen in the future and is concerned with making accurate predictions.

New cards

What is prescriptive modeling?

Uses a combination of optimization and simulation to quantify the impact of different prescribed actions in different scenarios.

New cards

Which one is an incorrect comparison between supervised and unsupervised learning?

Supervised learning has a target variable, while unsupervised does not have a target variable.
Supervised learning is to understand the relationship between the target and predictors to make accurate predictions, while unsupervised learning's goal is to extract relationship and structures.
Supervised learning cannot be coupled with unsupervised learning.

Supervised learning cannot be coupled with unsupervised learning.

New cards

What are some characteristics of predictive modeling?

Issue: there is a clearly identified and defined business issue that needs to be addressed.
Questions: the issue can be addressed with a few well-defined questions
Data: good and useful data is available
Impact: the predictions will likely drive actions or increase understanding
Better solution: PA likely produces a solution better than any existing approach
Update: continue to monitor and update the models when new data become available

New cards

In terms of characteristics of predictive modeling, what is the issue?

There is a clearly identified and defined business issue that needs to be addressed.

New cards

In terms of characteristics of predictive modeling, what are the questions?

The issue can be addressed with a few well-defined questions

New cards

In terms of characteristics of predictive modeling, what is the data?

good and useful data is available

New cards

In terms of characteristics of predictive modeling, what is the impact?

the predictions will likely drive actions or increase understanding

New cards

In terms of characteristics of predictive modeling, what is the better solution?

PA likely produces a solution better than any existing approach

New cards

In terms of characteristics of predictive modeling, what is the update?

continue to monitor and update the models when new data become available

New cards

Explain two desirable characteristics of a key performance indicator.

Relevance and measurability

New cards

In terms of KPIs, what is the relevance piece?

Should align with the overall business objective and the interest of your client as closely as possible.

New cards

In terms of KPIs, what is the measurability piece?

Easily measurable and provides an objective, quantitative basis to measure the success of the project.

New cards

Explain three data quality issues one should examine in practice.

Reasonableness
Consistency
Sufficient documentation

New cards

In terms of data quality, what is reasonableness?

are there any nonsensical data observations and to check for those

New cards

In terms of data quality, what is consistency?

records in the data are inputted consistently
1. Numeric: same units and measures
2. Categorical: levels are defined and recorded consistently, such as using Iowa vs. IA

New cards

In terms of data quality, what is sufficient documentation?

a description of the overall data set and data source

New cards

What is one advantage of using MAE over MSE?

MAE places a much smaller weight on large losses than MSE and therefore makes the fitted model more robust against outliers.

New cards

What is one disadvantage of using MAE over MSE?

MSE is more frequently used in practice because it is differentiable and eases model fitting.

New cards

Describe the steps of cross-validations and how it can be used to tune parameters.

splitting the data into k groups, usually about ten
train the data on the k-1 groups. Then the one group left out is used to test the model. This is done so that each k group is left out once. This can be used to tune hyperparameters by cycling through the values on each iteration of when one group is left out.

New cards

What is bias in a predictive analytic context?

The difference between the respective value and the true value of the signal function

New cards

What is variance in a predictive analytic context?

Quantifies the amount by which the signal function would change if we estimated the true signal value using a different training set.

New cards

What happens to squared bias, variance, and flexibility in an underfitted model?

High squared bias and low variance. As the flexibility of the model increases, the bias initially tends to drop faster than the variance increases, so the test error decreases overall.

New cards

What is the difference between variables and features in a predictive analytic context?

A variable is the raw data predictors taken directly from the data set. Features are transformed (derivations) versions of the variables.

New cards

What are two uses of feature generation?

Predictive power and interpretability

New cards

In the context of feature generation, what is predictive power?

Transforms data and turns the information contained in the original variables into a more useful form.

New cards

In the context of feature generation, what is interpretability?

Make the model easier to interpret because the original variables enter the model in a more meaningful and interpretable fashion.

New cards

Identify and briefly describe one situation where it is an advantage to split the data by time rather than by random assignment.

Doing out-of-time validation will be advantageous when we are interested in how well the model trained on more distant years extrapolates past trends to future, unseen years.

New cards

As the degree of polynomial function increases, what happens to the squared bias, variance, and test MSE?

The squared bias will tend to decrease.
The variance will increase.
The test MSE will tend to exhibit a U-shape.

New cards

What are two uses/applications of exploratory data analysis?

Data visualization and generating insights for modeling

New cards

In terms of exploratory data analysis applications, what does data visualization mean?

perform common sense checks on the data

New cards

In terms of exploratory data analysis applications, what does generating insights mean?

helps understand the characteristics of relationships between the variables in the data.

New cards

What are some pros to using summary statistics for exploring data?

Precise and objective
Easily comparable across variables, like comparing means.

New cards

What are some cons to using summary statistics for exploring data?

Can only capture a certain aspect of a variable’s distribution, not the full picture
Some statistics can be distorted by outliers

New cards

What are some pros to using graphical displays for exploring data?

Quick visual of the distribution
Can reveal information not easily captured by summary statistics

New cards

What are some cons to using graphical displays for exploring data?

Not as precise as summary statistics
Less comparable across variables
For more complex data, harder to read and interpret

New cards

Explain how a histogram can be used to visualize the distribution of a numeric variable.

Divides the observations of the variable into several equally spaced bins and provides a visual summary of observations in each bin.

New cards

Explain the problems with using a right-skewed variable in predictive modeling.

Model fitting: outliers will contribute substantially to the sum due to the squaring and have a disproportionate effect on the whole model
Predictive power: make it difficult to investigate the effect of the predictors on the target variable globally.

New cards

How do the log and root transformations address right-skewed variables?

Both shrink the values and symmetrize the distribution.

New cards

What is the difference between the log and root transformations?

The log is more aggressive than the square root.
The log cannot handle values less than one.

New cards

Explain how a boxplot can be used to visualize the distribution of a numeric variable.

Shows outliers and offers a useful graphical summary of the key numeric statistics.

New cards

Explain how a bar chart can be used to visualize the distribution of a categorical variable.

Turn the numeric counts in a frequency table into bars whose heights are proportional to the number of observations in each level of the variable. Easily tell which levels are the most popular and which have minimal observation on a relative basis.

New cards

Explain how a scatterplot can be used to visualize the relationship between two numeric variables.

Visualizes the relationship between two numeric variables across a wide range of their values. Reflect non-linear, more complex ones and yield insights that correlations alone cannot provide.

New cards

Explain how a split boxplot can be used to visualize the relationship between a numeric variable and a categorical variable.

If the level and/or size of the boxes vary remarkably across the levels of the categorical variable, then that is a pointer to a strong association between the two variables.

New cards

Identify a bivariate visualization that is suitable for representing the variation of the level proportions of a categorical variable across the levels of another categorical variable.

Filled bar chart shows the proportion that varies across another variable.

New cards

Identify a bivariate visualization that is suitable for representing the level proportions of a categorical variable within each level of another categorical variable.

Dodged bar charts align the two proportions within each category and put them side by side on a common baseline.

New cards

Explain why dodged bar charts often do a better job of conveying relative counts than stacked bar charts and pie charts.

The bars are put on a common scale since the bars are aligned. Using areas and angles may be prone to misjudgement and unnecessarily complicated compared to lengths.

New cards

Explain the problem with RSS and R² as model selection measures.

RSS and R² are merely goodness-of-fit measures of a linear model (to the training data) with no explicit regard to its prediction performance.
As more predictors are added, RSS will always decrease and R² will always increase.

New cards

Explain why AIC and BIC are considered indirect measures of prediction performance.

The adjust a goodness-of-fit measure on the training set such as the training RSS or training log likelihood to account for model complexity and prevent overfitting. They are not computed on held-out or test data.

New cards

What is the formula for AIC?

-2l + 2(p+1), where -2l is the goodness-of-fit measurement where ideally, we want a high value for l. The complexity is measured by 2(p+1), where the more parameters the model has, the more complex it is.

New cards

What is the formula for BIC?

-2l + ln(n)*(p+1), where -2l is the goodness-of-fit measurement where ideally, we want a high value for l. The complexity is measured by ln(n)*(p+1), where the more parameters the model has, the more complex it is.

New cards

Which measure (AIC or BIC) has a larger complexity penalty?

BIC

New cards

What properties should residuals of a linear model have?

No distinctive pattern, especially cone-shaped which would indicate heteroskedasticity
Approximately the same variance
Residuals should be approximately normal

New cards

Explain the meaning of interaction.

The association between one predictor and the target variable depends on the value (or level) of another predictor.

New cards

Explain the problems with collinear variables in a linear model.

Coefficients may be inflated
Interpreting may be difficult, since it would be incorrect in interpretations to say everything else held constant since the variables move together.
Coefficient may have high variance ⟹ uncertainty and instability in the coefficient values

New cards

How many models are evaluated with best subset selection?

2^p

New cards

How many models are evaluated in forward and backward selection?

1 + [p(p+1)]2

New cards

Why is it not a good idea to add or drop multiple features at a time when doing stepwise selection?

The significance of a feature can be significantly affected by the presence or absence of other features due to their correlations.

New cards

What model does forward selection start with?

The intercept-only model

New cards

What model does backward selection start with?

The full model containing every predictor

New cards

How does regularization work?

Consider one model hosting all potentially useful features and instead of OLS, use techniques that have regularize or shrink the coefficients towards zero, minimizing the modified objection function

New cards

Describe an important modification to the variables before fitting a regularized regression model.

Scale and standardize to prevent one predictor from showing a dominant effect

New cards

Explain how the regularization parameter lambda affects a regularized model.

As the value of lambda increases, the model becomes less flexible and the variance decreases, while the bias increases.

New cards

Explain why lambda and alpha are hyperparameters of a regularized model and how they are typically selected.

Because choosing them after the model is fit distorts the coefficient values; They are hyperparameters and are typically selected through cross validation.

New cards

Explain the one-standard-error rule for selecting the value of the regularization parameter of an elastic net.

The simplest regularized regression model whose CV error is within one SE of the minimum error (remember, the higher the value of lambda, the more regularization there is and the simpler the model becomes).

New cards

What does R² = 0 imply?

RSS = TSS, since R² = 1 - RSS/TSS, and this means that the fitted linear model is essentially the intercept-only model. The predictors collectively bring no useful information for understanding the target variable.

New cards

What does R² = 1 imply?

RSS = 0, which implies that the fitted model perfectly fits each training observation. This model is probably overfitted though.

New cards

Explain in terms of bias and variance how stepwise selection can improve the predictive power of a GLM.

By selecting a subset of the full list of predictors, we create a reduced model that has a lower degree of complexity and its predictions have a lower variance but a higher squared bias.

New cards

Stepwise selection vs regularization: model selection

Stepwise selection goes through an iterative process, while regularization fits one model with all the predictors.

New cards

Stepwise selection vs regularization: the parameter indexing model complexity

Stepwise selection uses the number of features as a direct measure of model complexity, while regularization uses the regulation parameter, lambda, as an indirect measure of model complexity.

New cards

Stepwise selection vs regularization: the treatment of categorical predictors

For stepwise, the entire categorical predictor with all levels is added or dropped as the algorithm iterates, unless we binarize, while regularization automatically binarizes and evaluates each factor level as separate.

New cards

Stepwise selection vs regularization: the treatment of numeric predictors

For stepwise, numeric predictors are left intact without standardization while in regularization, they must be standardized so that are on a common scale when the model is fitted.

New cards

How do you choose the optimal value of alpha?

Set up a grid of values of alpha; these have to be pre-specified.
For each value of alpha, select the corresponding optimal value of lambda and evaluate the corresponding CV error.
Adopt the value of alpha that produces the smallest CV error.

New cards

In R, what does the summary() function do?

displays a detailed analysis of the fitted model

New cards

In R, what does the coefficients() [and/or coef()] function do?

returns a vector of coefficient estimates

New cards

In R, what does does confint() do?

produces confidence intervals for the regression coefficients. The probability level is 95% by default but can be reset using the level option

New cards

In R, what does residuals() [or simply resid()] do?

returns a vector of raw residuals

New cards

In R, what does plot() do?

produces four diagnostic plots for evaluating the appropriateness of the fitted model

New cards

How do you explain the interaction of variables in words, given that the coefficient value of the interaction TV:radio is 0.001086 and the coefficient value of radio is 0.02886, and the coefficient for TV is 0.01910?

The effect of radio on expected sales (the target variable) is estimated to be 0.02886 + 0.001086 times TV, which increases by 0.001086 for every unit increase in TV.
Reversing the roles of TV and radio, the effect of TV on the expected sales increases by 0.001086 for every unit increase in radio.

New cards

Explain the two ways in which a GLM generalizes a linear model. Page 257

Target variable of a GLM is no longer confined to the class of normal random variables; it needs to be a member of the exponential family of distributions.
Instead of equating the mean of the target variable directly with the linear combination of predictions, a GLM sets a function of the target mean to be linearly related to the predictors.

New cards

Explain why a linear model can be regarded as a special case of a generalized linear model.

In the special case where the target variable is normally distributed and the link function is the identify function, then this GLM is a linear model.

New cards

Describe the characteristics of the Tweedie distribution and how it can serve as an alternative for modeling aggregate payments.

in-between distribution of Poisson and Gamma
discrete probability mass at zero and a PDF on the positive real line

New cards

If a distribute is mixed in nature and has a large mass at zero and a continuous distribution skewed to the right, what is an appropriate distribution?

Tweedie

New cards

Can the log link can be used when some of the observations of the target variable are zero?

Yes, because even though some of the observations of the target variable are zero, it does not necessarily invalidate the use of the log link, which is not applied to the target observations.

New cards

Which distributions do not permit zero values?

Gamma and inverse Gaussian

New cards

For offsets, how is the target variable affected?

The observations of the target variable should be averaged by exposure.

New cards

For weights, how is the target variable affected?

The observations are values aggregated over the exposure units.

New cards

In weights, the variance of each observation is…

inversely related to the size of the exposure, which serves as the weight for that observation.

New cards

The exposure, when serving as an offset, is in…

direct proportion to the mean of the target variable.

New cards

Weights vs offsets: which one appears directly in the model equation?

Offsets, NOT weights

New cards

Weights (do/do not) affect the mean of the target variable?

DO NOT

New cards

Offsets (do/do not) affect the variance?

DO NOT`

New cards

State the statistical method typically used to estimate the parameters of a GLM.

The maximum likelihood estimation (MLE) method is employed to estimate unknown coefficients.

New cards

What is the problem with deviance as a model selection criterion?

Deviance of a GLM parallels the RSS of a linear model in the sense that it is merely a goodness-of-fit measure on the training set and always decreases. As a result, the most complex model will likely be chosen, which doesn’t always make sense.

New cards

Explain the limitations of the likelihood ratio test as a model selection method.

While the likelihood ratio test (LRT) is a simple way to compare GLMs, the applicability of this classical approach is somewhat restricted because it can only be used to compare one pair of GLMs at a time.
The simpler GLM must be a special case of, or nested within, the more complex GLM.

New cards

Explain how regularization for GLMs works.

A regularized model results from minimizing the penalized objective function given by deviance (goodness of fit) + regularization penalty (model complexity)

New cards

Explain the importance of setting a cutoff for a binary classifier.

Setting a cutoff is important because it helps translate the predicted probabilities into the predicted classes.

New cards

Explain the relationship between accuracy, sensitivity, and specificity

Accuracy is a weighted average of specificity and sensitivity, where the weights are the proportions of observations belonging to the two classes.
- Accuracy = [TN + TP]/n

100

New cards

What does it mean if the cutoff is set to 0?

All predicted probabilities, which must be non-negative, will exceed the cutoff, meaning that everyone is predicted to be positive.