ML - Chapter 1-4 (ENGLISH)

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/58

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

59 Terms

1
New cards
Why is training error not a reliable indicator of a model's performance on new, unseen data?
Training error can be made arbitrarily small by increasing model complexity, leading to overfitting and failing to reflect the model's true generalization ability.
2
New cards

What does the 'irreducible error' term in the decomposition of test error represent?

It represents the noise inherent in the data that cannot be reduced by any model, setting a theoretical upper bound on prediction accuracy.

3
New cards

If a model has high variance but low bias, what is likely happening during the training process?

If a model has high variance but low bias, it is likely overfitting the training data by capturing noise and small fluctuations, resulting in excellent training performance but poor generalization to new data. This happens because the model is too complex relative to the amount of training data, making its predictions sensitive to specific training examples.

4
New cards

In the context of statistical modeling, what trade-off does the choice between a simple and a complex model often represent?

It represents a trade-off between model interpretability and predictive accuracy, where simpler models are more interpretable but complex models are often more accurate.

5
New cards

In statistical learning, what is the fundamental difference between a regression problem and a classification problem?

Regression problems predict a continuous or quantitative output value, whereas classification problems predict a discrete or categorical label.

6
New cards

What is the key distinction between supervised and unsupervised learning?

Supervised learning uses data with known outcomes or labels to build a predictive model, while unsupervised learning works with unlabeled data to discover patterns.

7
New cards
The _____ classifier provides a theoretical benchmark for the lowest possible test error rate by assigning an observation to the class with the highest conditional probability.
Bayes
8
New cards
How does the K-Nearest Neighbors (KNN) algorithm make a prediction for a new observation?
It identifies the K training observations closest to the new point and uses the majority class (for classification) or the average value (for regression) among them as the prediction.
9
New cards
What is the primary assumption made in simple linear regression about the relationship between the predictor $X$ and the response $Y$?
It assumes the relationship is approximately linear, which can be modeled by the equation $Y \approx \beta_0 + \beta_1X$.
10
New cards
In the context of least squares regression, what quantity is being minimized to find the optimal coefficient estimates?
The residual sum of squares (RSS), which is the sum of the squared differences between the observed and predicted response values.
11
New cards
What does the standard error of a regression coefficient, such as $SE(\hat{\beta}_1)$, measure?
It measures the average amount that the coefficient estimate $\hat{\beta}_1$ varies from the actual value of $\beta_1$ if the model were refit on different datasets.
12
New cards
If the 95% confidence interval for a regression coefficient $\beta_1$ contains zero, what does this imply about the relationship between the predictor and the response?
It implies that there is not a statistically significant association between the predictor and the response at the 5% significance level.
13
New cards
What does the $R^2$ statistic quantify in a linear regression model?
It quantifies the proportion of the variance in the response variable that can be explained by the predictor variables in the model.
14
New cards
When moving from simple to multiple linear regression, how does the interpretation of a coefficient $\hat{\beta}_j$ change?
It represents the average effect on Y of a one-unit increase in $X_j$, holding all other predictor variables constant.
15
New cards
What is the purpose of the F-statistic in a multiple linear regression model?
It tests the overall significance of the model by assessing whether at least one of the predictor variables has a non-zero coefficient.
16
New cards
How are qualitative predictors with more than two levels typically incorporated into a linear regression model?
By creating multiple dummy variables, where one level is chosen as the baseline and is not assigned a variable.
17
New cards
What does a significant interaction term between two predictors, say $X_1$ and $X_2$, indicate in a regression model?
It indicates that the effect of one predictor on the response variable depends on the level of the other predictor.
18
New cards

How can a residual plot (residuals vs. fitted values) be used to detect non-linearity in a regression model?

If the residual plot exhibits a visible pattern or curve, it suggests that the linear model is not capturing the true non-linear relationship in the data.

19
New cards
What is collinearity in the context of multiple regression, and why is it a problem?
Collinearity is when two or more predictor variables are closely related, making it difficult to separate their individual effects on the response and increasing the uncertainty of their coefficient estimates.
20
New cards
The _____ is a measure used to quantify the severity of multicollinearity for a specific predictor in a multiple regression model.
Variance Inflation Factor (VIF)
21
New cards
Why is linear regression generally unsuitable for predicting a binary (0/1) response?
Its predictions are not constrained to the [0, 1] interval, making them difficult to interpret as probabilities.
22
New cards
What does the logistic function transform a linear combination of predictors into?
It transforms the linear combination into a probability, which is always bounded between 0 and 1.
23
New cards
In logistic regression, the coefficients are estimated by maximizing the _____ function.
likelihood
24
New cards
How is a coefficient $\hat{\beta}_1$ in a logistic regression model interpreted?
A one-unit increase in the predictor $X_1$ is associated with an increase in the log-odds of the outcome by $\hat{\beta}_1$ units.
25
New cards
What is the primary assumption of Linear Discriminant Analysis (LDA) regarding the distribution of predictors within each class?
LDA assumes that the predictors within each class follow a multivariate Gaussian (normal) distribution with a common covariance matrix across all classes.
26
New cards
How does Quadratic Discriminant Analysis (QDA) differ from LDA in its assumptions?
QDA relaxes the assumption of a common covariance matrix, allowing each class to have its own distinct covariance matrix.
27
New cards
Under what circumstances might LDA be a better choice than QDA, despite QDA's greater flexibility?
LDA is often preferred when the number of training observations is small, as its lower flexibility reduces the risk of overfitting compared to QDA.
28
New cards
What information does a confusion matrix provide about the performance of a classifier?
It provides a breakdown of correct and incorrect predictions, categorizing them into true positives, true negatives, false positives, and false negatives.
29
New cards
An ROC curve plots the true positive rate (sensitivity) against what other metric?
It plots the true positive rate against the false positive rate (1 - specificity) for all possible classification thresholds.
30
New cards
What is the primary goal of resampling methods like cross-validation in statistical learning?
Their primary goal is to estimate the test error of a model and to help with model selection without requiring a separate, large test set.
31
New cards
What is a major drawback of the validation set approach for estimating test error?
The estimate of the test error can be highly variable depending on which observations are included in the training and validation sets.
32
New cards
Why does Leave-One-Out Cross-Validation (LOOCV) produce an approximately unbiased estimate of the test error?
Because each training set in LOOCV contains $n-1$ observations, which is almost the entire dataset, closely mimicking the model fit on all $n$ observations.
33
New cards
Despite its low bias, what is a key disadvantage of LOOCV compared to k-fold CV?
LOOCV has high variance in its test error estimate and is computationally much more expensive than k-fold CV.
34
New cards
How does k-fold cross-validation represent a compromise in the bias-variance trade-off compared to the validation set and LOOCV approaches?
It has less bias than the validation set approach (since it uses more data for training) and less variance than LOOCV (by averaging over k folds).
35
New cards
What is the fundamental operation of the bootstrap method?
It involves repeatedly drawing random samples with replacement from the original dataset to create multiple bootstrap datasets.
36
New cards
What is a primary application of the bootstrap in statistical learning?
It is used to quantify the uncertainty of an estimator, such as calculating the standard error of a regression coefficient.
37
New cards
What is the main difference between modeling for prediction and modeling for inference?
Prediction aims to accurately predict an output for new inputs, treating the model as a black box, while inference aims to understand the relationship between inputs and the output.
38
New cards
A statistical method that makes an explicit assumption about the functional form of the relationship between predictors and the response is known as a _____ method.
parametric
39
New cards
Why might a less flexible model sometimes produce more accurate predictions than a highly flexible one?
A less flexible model is less prone to overfitting the training data, which can lead to better generalization and lower error on a new test set.
40
New cards

In a multiple regression model, how is the null hypothesis for the F-statistic stated?

The null hypothesis states that all regression coefficients are equal to zero.

41
New cards
What does a high-leverage point in a regression analysis signify?
It signifies an observation with an unusual value for its predictor variable(s) compared to the other observations.
42
New cards
Why can an outlier have a significant impact on a least squares regression fit?
Because the least squares method minimizes the sum of squared residuals, a large residual from an outlier gets squared, giving it a disproportionately large influence on the fitted line.
43
New cards
The extension of logistic regression to handle a qualitative response with more than two classes is known as _____ logistic regression.
multinomial
44
New cards
If we use linear regression to predict a binary response, we might obtain probability estimates outside the $[0,1]$ range. However, what is one surprising result regarding the classifications made?
The classifications obtained will be the same as those from Linear Discriminant Analysis (LDA).
45
New cards
What does a VIF value exceeding 5 or 10 typically indicate in a regression model?
It indicates a problematic amount of collinearity, suggesting the variance of that coefficient's estimate is highly inflated.
46
New cards
In the context of classification models, what is the 'sensitivity' of a classifier?
Sensitivity, or the true positive rate, is the proportion of actual positive cases that are correctly identified as positive.
47
New cards
In the context of classification models, what is the 'specificity' of a classifier?
Specificity is the proportion of actual negative cases that are correctly identified as negative.
48
New cards
The Poisson regression model is a type of generalized linear model (GLM) often used for what kind of response variable?
It is used for count data, where the response variable represents the number of times an event has occurred.
49
New cards
What is the purpose of the validation set in the validation set approach?
It is used to estimate the test error for a model that has been fitted on the separate training set.
50
New cards
Why is the validation set error rate for a model fit on a training set of size $n/2$ likely to be an overestimate of the test error rate for a model fit on the entire dataset of size $n$?
Models fit on less data tend to perform worse, so the error on the validation set reflects a model trained on a smaller, potentially less representative dataset.
51
New cards
In Python's `numpy`, what does 0-based indexing mean?
It means the first element of a sequence is accessed with index 0, the second with index 1, and so on.
52
New cards
What is the primary purpose of the `pandas` library in Python for data science?
It is used to create and work with data frame objects, which are ideal for handling datasets with different data types and named columns.
53
New cards
When fitting a logistic regression model, what is the primary advantage of using maximum likelihood estimation over non-linear least squares?
Maximum likelihood estimation has better statistical properties for this type of model.
54
New cards
If the true relationship between a predictor and response is highly non-linear, would you expect a flexible or inflexible model to perform better?
A flexible model would be expected to perform better as it can capture the non-linear patterns, whereas an inflexible model like linear regression would have high bias.
55
New cards
What is the effect of increasing K in the K-Nearest Neighbors (KNN) algorithm on the model's flexibility?
Increasing K makes the decision boundary smoother and less flexible, increasing bias but decreasing variance.
56
New cards
In the `sklearn` library, what is the common three-step pattern for using a classifier or model?
The pattern is to first instantiate the model object, then fit it to the training data using the `.fit()` method, and finally make predictions using the `.predict()` method.
57
New cards
What problem does adding a dummy variable for every level of a categorical predictor cause in a linear model that also includes an intercept?
It introduces perfect collinearity because the sum of the dummy variables equals the intercept, making the model impossible to fit.
58
New cards
A model builder in `statsmodels` like `ModelSpec` that automatically generates dummy variables for a qualitative feature is performing a process known as _____ encoding.
one-hot
59
New cards
If a residual plot for a linear model shows a funnel shape (i.e., the spread of residuals increases with the fitted values), what assumption is being violated?
The assumption of constant variance of the error terms (homoscedasticity) is being violated.