MATH4323 EXAM 1

5.0(1)
studied byStudied by 20 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/94

flashcard set

Earn XP

Description and Tags

knn, svm, svc, mmc, classification, regression, supervised learning, unsupervised learning, etc.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

95 Terms

1
New cards

What is the main difference between supervised and unsupervised learning?

Supervised learning uses labeled data, while unsupervised data does not

2
New cards

In a classification problem, what kind of output is predicted?

A categorical value

3
New cards

In a classification task, what does the probability P(Y = j | X= x0 ) represent?

The probability that the target variable (Y) belongs to class “j” given that the feature vector (X) is equal to “x0

4
New cards

What is the purpose of the K-Nearest Neighbor (KNN) algorithm?

To classify an observation based on the majority class of its nearest neighbors

5
New cards

In KNN, what does increasing the value of K generally do?

It makes the decision boundary smoother

6
New cards

How is the training error rate in classification calculated?

The proportion of incorrect classifications in the training set

7
New cards

What is the main reason we prioritize test error over training error?

Training error does not indicate how well a model generalizes to unseen data

8
New cards

What is a key sign of overfitting in a machine learning model?

The training error is very low, but the test error is high

9
New cards

What happens when a model is underfitting?

The model performs poorly on both training and test data

10
New cards

Why is the validation set approach used to estimate test error?

It provides a simple way to assess model performance on unseen data

11
New cards

What is a drawback of the validation set approach?

The test error estimate heavily depends on the random train/test split

12
New cards

How does Leave-One-Out Cross-Validation (LOOCV) differ from the validation set approach?

LOOCV provides a more stable estimate of test error

13
New cards

What is the biggest drawback of LOOCV?

It is computationally expensive and time-consuming

14
New cards

Which of the following is NOT an advantage of LOOCV?

It is computationally efficient

15
New cards

When performing KNN classification, what is the effect of choosing K =1?

The model is more likely to overfit the training data

16
New cards

What happens when K in a KNN model is too large?

The model underfits the data and may generalize poorly

17
New cards

Why is K-fold Cross-Validation often preferred over LOOCV?

It requires less computation while still providing stable test error estimates

18
New cards

Why is scaling important for the KNN classifier?

It ensures that all variables contribute equally to the distance calculation

19
New cards

Suppose a dataset contains two predictors: salary (in dollars) and age (in years). Before scaling, which variable is likely to have a greater impact on KNN’s distance calculation?

Salary, due to its larger numerical range compared to age.

20
New cards

What transformation is applied to standardize a variable?

xjsc = (xj - j)/sd(xj)

21
New cards

What is the mean of a variable after standardization?

0

22
New cards

What is the standard deviation of a variable after standardization?

1

23
New cards

If a dataset is not scaled, what type of variables will dominate the distance calculation in KNN?

Variables with larger numerical ranges

24
New cards

What is a hyperplane in a 2D space?

A line

25
New cards

The equation of a hyperplane in 2D space is given by:

β0​ + β1​X1​ + β2​X2 ​= 0

26
New cards

If a point X = (X1, X2) satisfies the equation of a hyperplane then:

It lies on the hyperplane

27
New cards

How can we determine which side of the hyperplane a point lies?

By checking if β0​ + β1​X1​ + β2​X2 is positive or negative

28
New cards

The maximal margin classifier is used when:

a dataset has an infinite number of separating hyperplanes

29
New cards

What is the goal of a maximal margin classifier?

To find the hyperplane that maximizes the distance from the closest training observations

30
New cards

What are support vectors?

The points closest to the separating hyperplane

31
New cards

In the optimization problem for the maximal margin classifier, the constraint ensures that:

all points are correctly classified and at least a margin M away from the hyperplane

32
New cards

What happens if no separating hyperplane exists in the dataset?

The maximal margin classifier fails to find a solution

33
New cards

What is one major issue with the maximal margin classifier?

It may be sensitive to individual observations and cause overfitting

34
New cards

What is the primary purpose of a support vector classifier (soft margin classifier)?

To allow some observations to violate the margin for better generalization

35
New cards

What happens when an observation has a slack variable ϵi​ = 0 in a support vector classifier?

It is on the correct side of the margin

36
New cards

What does increasing the tuning parameter C in a support vector classifier do?

Widens the margin and allows more violations

37
New cards

How can we select an optimal value for the tuning parameter C?

By using cross-validation to test different values of C

38
New cards

In a support vector classifier, what does the constraint ∑i=1n ​ϵi​ ≤ C mean?

The sum of margin violations must be less than or equal to C

39
New cards

What is the effect of enlarging the predictor space in support vector classifiers?

It allows the classifier to handle non-linear decision boundaries

40
New cards

Which of the following statements about slack variables ϵi​ is correct?

If ϵi​ > 1, the observation is misclassified

41
New cards

In the context of support vector classifiers, what is a “soft margin”?

A margin that allows some observations to be on the wrong side of the margin or hyperplane

42
New cards

What is a key limitation of support vector classifiers with a linear decision boundary?

They perform poorly when the class boundary is non-linear

43
New cards

How can support vector classifiers handle non-linear decision boundaries?

By using polynomial or other non-linear transformations of the predictor variables

44
New cards

What is the biggest advantage of using kernels instead of explicitly enlarging the feature space?

It reduces computational complexity

45
New cards

Why is explicitly enlarging the feature space computationally expensive?

It leads to an exponential increase in the number of features

46
New cards

In an SVM, kernel functions allow computations to be performed in:

The original feature space without explicit transformation

47
New cards

What does the gamma (γ) parameter control in the radial kernel SVM?

The importance of training samples in decision making

48
New cards

Why is cross-validation important when tuning SVM hyperparameters?

To obtain a more reliable test error estimate

49
New cards

What function in R is recommended for performing hyperparameter tuning with SVMs?

tune()

50
New cards

What is the purpose of the set.seed(1) command before running knn()?

To ensure reproducibility of results

51
New cards

What does the knn() function in R require as input arguments?

Training dataset, testing dataset, class labels, and number of neighbors

52
New cards

What does the mean(knn.pred != y.test) function calculate?

The misclassification error

53
New cards

How does increasing the value of K affect the misclassification error?

It may increase or decrease depending on the dataset

54
New cards

What function in R can be used to compute distances between observations in a dataset?

dist()

55
New cards

Given three customers with salaries and ages in a matrix, what issue arises when calculating distances without scaling?

The salary variable dominates the distance calculations.

56
New cards

How does the scale() function in R standardize data?

It transforms variables to have a mean of zero and a standard deviation of one.

57
New cards

After standardizing data, what happens to the distances between observations?

They become more comparable across different variables.

58
New cards

Why does the training error for K=1 remain unchanged before and after scaling?

K=1 considers only the closest observation, which is itself.

59
New cards

What is the formula for min-max normalization?

(X−min(X))/(max(X)−min(X))

60
New cards

What is the main difference between standardization and min-max normalization?

Standardization centers data with a mean of zero and a standard deviation of one, while min-max normalization rescales data to a fixed range (0 to 1).

61
New cards

After applying min-max normalization to two vectors where one is 10 times larger than the other, what happens?

The results are identical after normalization.

62
New cards

When applying KNN with K=1, how do you expect the test error to behave?

The test error rate should be high because K=1 is prone to overfitting.

63
New cards

What does a confusion matrix output for a KNN model provide?

The true positive, true negative, false positive, and false negative counts.

64
New cards

When using leave-one-out cross-validation (LOOCV) with K=1 on the Caravan dataset, which of the following is true?

The KNN model will be trained and tested on the entire dataset, with each observation being used as the test set once.

65
New cards

What is the advantage of using cross-validation (CV) over a validation set approach?

CV reduces the variability in the test error estimate by averaging over multiple splits.

66
New cards

What does the knn.cv() function perform in the context of KNN?

It applies leave-one-out cross-validation to the KNN model.

67
New cards

When using the svm() function, what does the cost parameter control?

The cost parameter in the svm() function controls the cost of margin violations (how many violations we are willing to tolerate).

68
New cards

What happens when the cost parameter is set to a small value in the svm() function?

When the cost parameter is small, many support vectors will either be on the margin or violate it, and the margins will be wide.

69
New cards

What does the scale = FALSE argument in the svm() function do?

The scale = FALSE argument tells the svm() function not to scale the features to have a mean of zero and a standard deviation of one.

70
New cards

In the output of the summary(svmfit) function, what does the number of support vectors indicate?

The number of support vectors indicates the number of data points used to define the margin.

71
New cards

What does the svmfit$index command return?

The svmfit$index command returns the indices of the support vectors

72
New cards

How can we visually check the performance of the support vector classifier?

The performance of the support vector classifier can be visually checked by plotting the result using the plot(svmfit, dat) function.

73
New cards

What does the kernel = "linear" argument in the svm() function specify?

The kernel = "linear" argument specifies the use of a linear kernel for the support vector classifier.

74
New cards

What is the primary focus of prediction in statistical modeling?

Prediction focuses on estimating future outcomes or unknown values based on existing data.

75
New cards

In a linear regression model, what is the primary objective when using inference?

Inference in linear regression is focused on estimating the coefficients and understanding the effect of each predictor variable.

76
New cards

In the context of prediction vs. inference, which of the following statements is true?

Prediction focuses on making accurate forecasts on new data, while inference focuses on understanding the relationships between variables and interpreting their significance.

77
New cards

What does the confusion matrix provide in the context of a classification model?

The confusion matrix summarizes the performance of a classification model by displaying the counts of correct and incorrect predictions for each class.

78
New cards

In a confusion matrix, what does the True Positive (TP) represent?

True Positive (TP) represents the number of times the model correctly predicted the positive class.

79
New cards

What is a False Positive (FP) in the context of a confusion matrix?

A False Positive (FP) occurs when the model incorrectly predicts the positive class when the actual class is negative.

80
New cards

What does the False Negative (FN) indicate in a confusion matrix?

A False Negative (FN) indicates that the model incorrectly predicted the negative class when it should have predicted the positive class.

81
New cards

Which of the following is the correct formula for accuracy based on the confusion matrix?

Accuracy is calculated as the sum of true positives and true negatives divided by the total number of predictions (TP + FP + FN + TN).

82
New cards

What is the formula for Precision?

Precision is calculated as the number of true positives (TP) divided by the sum of true positives and false positives (TP + FP).

83
New cards

What does a Test Error rate of 0.2 indicate?

A Test Error of 0.2 indicates that 20% of the predictions are incorrect, which is the misclassification rate.

84
New cards

Which of the following best describes Test Error?

Test Error is the proportion of misclassified observations (FP + FN) over the total observations (TP + FP + FN + TN).

85
New cards

Can SVM and KNN be used for both classification and regression models?

Yes

86
New cards

Small K indicates:

high curvature, complex boundary, overfitting

87
New cards

Large K indicates:

low curvature, smooth boundary, undercutting

88
New cards

True or false: KNN is sensitive to outliers

True

89
New cards

A large margin indicates:

the classifier is more robust to small changes in the data

90
New cards

What are slack variables?

Variables that allow some data points to be misclassified or lie within the margin

91
New cards

Small C indicates:

fewer violations, narrow margin (closer to maximal margin classifier)

92
New cards

Large C indicates:

more violations, wider margin

93
New cards

Large γ indicates:

high curvature, decision region is high, may lead to islands around data points

94
New cards

Small γ indicates:

low curvature, broad decision region

95
New cards