1/94
knn, svm, svc, mmc, classification, regression, supervised learning, unsupervised learning, etc.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is the main difference between supervised and unsupervised learning?
Supervised learning uses labeled data, while unsupervised data does not
In a classification problem, what kind of output is predicted?
A categorical value
In a classification task, what does the probability P(Y = j | X= x0 ) represent?
The probability that the target variable (Y) belongs to class “j” given that the feature vector (X) is equal to “x0”
What is the purpose of the K-Nearest Neighbor (KNN) algorithm?
To classify an observation based on the majority class of its nearest neighbors
In KNN, what does increasing the value of K generally do?
It makes the decision boundary smoother
How is the training error rate in classification calculated?
The proportion of incorrect classifications in the training set
What is the main reason we prioritize test error over training error?
Training error does not indicate how well a model generalizes to unseen data
What is a key sign of overfitting in a machine learning model?
The training error is very low, but the test error is high
What happens when a model is underfitting?
The model performs poorly on both training and test data
Why is the validation set approach used to estimate test error?
It provides a simple way to assess model performance on unseen data
What is a drawback of the validation set approach?
The test error estimate heavily depends on the random train/test split
How does Leave-One-Out Cross-Validation (LOOCV) differ from the validation set approach?
LOOCV provides a more stable estimate of test error
What is the biggest drawback of LOOCV?
It is computationally expensive and time-consuming
Which of the following is NOT an advantage of LOOCV?
It is computationally efficient
When performing KNN classification, what is the effect of choosing K =1?
The model is more likely to overfit the training data
What happens when K in a KNN model is too large?
The model underfits the data and may generalize poorly
Why is K-fold Cross-Validation often preferred over LOOCV?
It requires less computation while still providing stable test error estimates
Why is scaling important for the KNN classifier?
It ensures that all variables contribute equally to the distance calculation
Suppose a dataset contains two predictors: salary (in dollars) and age (in years). Before scaling, which variable is likely to have a greater impact on KNN’s distance calculation?
Salary, due to its larger numerical range compared to age.
What transformation is applied to standardize a variable?
xjsc = (xj - x̂j)/sd(xj)
What is the mean of a variable after standardization?
0
What is the standard deviation of a variable after standardization?
1
If a dataset is not scaled, what type of variables will dominate the distance calculation in KNN?
Variables with larger numerical ranges
What is a hyperplane in a 2D space?
A line
The equation of a hyperplane in 2D space is given by:
β0 + β1X1 + β2X2 = 0
If a point X = (X1, X2) satisfies the equation of a hyperplane then:
It lies on the hyperplane
How can we determine which side of the hyperplane a point lies?
By checking if β0 + β1X1 + β2X2 is positive or negative
The maximal margin classifier is used when:
a dataset has an infinite number of separating hyperplanes
What is the goal of a maximal margin classifier?
To find the hyperplane that maximizes the distance from the closest training observations
What are support vectors?
The points closest to the separating hyperplane
In the optimization problem for the maximal margin classifier, the constraint ensures that:
all points are correctly classified and at least a margin M away from the hyperplane
What happens if no separating hyperplane exists in the dataset?
The maximal margin classifier fails to find a solution
What is one major issue with the maximal margin classifier?
It may be sensitive to individual observations and cause overfitting
What is the primary purpose of a support vector classifier (soft margin classifier)?
To allow some observations to violate the margin for better generalization
What happens when an observation has a slack variable ϵi = 0 in a support vector classifier?
It is on the correct side of the margin
What does increasing the tuning parameter C in a support vector classifier do?
Widens the margin and allows more violations
How can we select an optimal value for the tuning parameter C?
By using cross-validation to test different values of C
In a support vector classifier, what does the constraint ∑i=1n ϵi ≤ C mean?
The sum of margin violations must be less than or equal to C
What is the effect of enlarging the predictor space in support vector classifiers?
It allows the classifier to handle non-linear decision boundaries
Which of the following statements about slack variables ϵi is correct?
If ϵi > 1, the observation is misclassified
In the context of support vector classifiers, what is a “soft margin”?
A margin that allows some observations to be on the wrong side of the margin or hyperplane
What is a key limitation of support vector classifiers with a linear decision boundary?
They perform poorly when the class boundary is non-linear
How can support vector classifiers handle non-linear decision boundaries?
By using polynomial or other non-linear transformations of the predictor variables
What is the biggest advantage of using kernels instead of explicitly enlarging the feature space?
It reduces computational complexity
Why is explicitly enlarging the feature space computationally expensive?
It leads to an exponential increase in the number of features
In an SVM, kernel functions allow computations to be performed in:
The original feature space without explicit transformation
What does the gamma (γ
) parameter control in the radial kernel SVM?
The importance of training samples in decision making
Why is cross-validation important when tuning SVM hyperparameters?
To obtain a more reliable test error estimate
What function in R is recommended for performing hyperparameter tuning with SVMs?
tune()
What is the purpose of the set.seed(1)
command before running knn()
?
To ensure reproducibility of results
What does the knn()
function in R require as input arguments?
Training dataset, testing dataset, class labels, and number of neighbors
What does the mean(knn.pred != y.test)
function calculate?
The misclassification error
How does increasing the value of K affect the misclassification error?
It may increase or decrease depending on the dataset
What function in R can be used to compute distances between observations in a dataset?
dist()
Given three customers with salaries and ages in a matrix, what issue arises when calculating distances without scaling?
The salary variable dominates the distance calculations.
How does the scale()
function in R standardize data?
It transforms variables to have a mean of zero and a standard deviation of one.
After standardizing data, what happens to the distances between observations?
They become more comparable across different variables.
Why does the training error for K=1 remain unchanged before and after scaling?
K=1 considers only the closest observation, which is itself.
What is the formula for min-max normalization?
(X−min(X))/(max(X)−min(X))
What is the main difference between standardization and min-max normalization?
Standardization centers data with a mean of zero and a standard deviation of one, while min-max normalization rescales data to a fixed range (0 to 1).
After applying min-max normalization to two vectors where one is 10 times larger than the other, what happens?
The results are identical after normalization.
When applying KNN with K=1, how do you expect the test error to behave?
The test error rate should be high because K=1 is prone to overfitting.
What does a confusion matrix output for a KNN model provide?
The true positive, true negative, false positive, and false negative counts.
When using leave-one-out cross-validation (LOOCV) with K=1 on the Caravan dataset, which of the following is true?
The KNN model will be trained and tested on the entire dataset, with each observation being used as the test set once.
What is the advantage of using cross-validation (CV) over a validation set approach?
CV reduces the variability in the test error estimate by averaging over multiple splits.
What does the knn.cv()
function perform in the context of KNN?
It applies leave-one-out cross-validation to the KNN model.
When using the svm()
function, what does the cost
parameter control?
The cost
parameter in the svm()
function controls the cost of margin violations (how many violations we are willing to tolerate).
What happens when the cost
parameter is set to a small value in the svm()
function?
When the cost
parameter is small, many support vectors will either be on the margin or violate it, and the margins will be wide.
What does the scale = FALSE
argument in the svm()
function do?
The scale = FALSE
argument tells the svm()
function not to scale the features to have a mean of zero and a standard deviation of one.
In the output of the summary(svmfit)
function, what does the number of support vectors indicate?
The number of support vectors indicates the number of data points used to define the margin.
What does the svmfit$index
command return?
The svmfit$index
command returns the indices of the support vectors
How can we visually check the performance of the support vector classifier?
The performance of the support vector classifier can be visually checked by plotting the result using the plot(svmfit, dat)
function.
What does the kernel = "linear"
argument in the svm()
function specify?
The kernel = "linear"
argument specifies the use of a linear kernel for the support vector classifier.
What is the primary focus of prediction in statistical modeling?
Prediction focuses on estimating future outcomes or unknown values based on existing data.
In a linear regression model, what is the primary objective when using inference?
Inference in linear regression is focused on estimating the coefficients and understanding the effect of each predictor variable.
In the context of prediction vs. inference, which of the following statements is true?
Prediction focuses on making accurate forecasts on new data, while inference focuses on understanding the relationships between variables and interpreting their significance.
What does the confusion matrix provide in the context of a classification model?
The confusion matrix summarizes the performance of a classification model by displaying the counts of correct and incorrect predictions for each class.
In a confusion matrix, what does the True Positive (TP) represent?
True Positive (TP) represents the number of times the model correctly predicted the positive class.
What is a False Positive (FP) in the context of a confusion matrix?
A False Positive (FP) occurs when the model incorrectly predicts the positive class when the actual class is negative.
What does the False Negative (FN) indicate in a confusion matrix?
A False Negative (FN) indicates that the model incorrectly predicted the negative class when it should have predicted the positive class.
Which of the following is the correct formula for accuracy based on the confusion matrix?
Accuracy is calculated as the sum of true positives and true negatives divided by the total number of predictions (TP + FP + FN + TN).
What is the formula for Precision?
Precision is calculated as the number of true positives (TP) divided by the sum of true positives and false positives (TP + FP).
What does a Test Error rate of 0.2 indicate?
A Test Error of 0.2 indicates that 20% of the predictions are incorrect, which is the misclassification rate.
Which of the following best describes Test Error?
Test Error is the proportion of misclassified observations (FP + FN) over the total observations (TP + FP + FN + TN).
Can SVM and KNN be used for both classification and regression models?
Yes
Small K indicates:
high curvature, complex boundary, overfitting
Large K indicates:
low curvature, smooth boundary, undercutting
True or false: KNN is sensitive to outliers
True
A large margin indicates:
the classifier is more robust to small changes in the data
What are slack variables?
Variables that allow some data points to be misclassified or lie within the margin
Small C indicates:
fewer violations, narrow margin (closer to maximal margin classifier)
Large C indicates:
more violations, wider margin
Large γ indicates:
high curvature, decision region is high, may lead to islands around data points
Small γ indicates:
low curvature, broad decision region