CS 441: Applied Machine Learning - Final Review

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/237

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

238 Terms

1
New cards

Lecture 2-3: K-NN, Classification, Regression, and Data

True or false: With different sets of M test samples, we would probably get the same error measurement.

False. There would be some variance in the error measurement.

2
New cards

Lecture 2-3: K-NN, Classification, Regression, and Data

True or false: If we increase M, we should get a more accurate (lower variance) estimate of the error.

True. Increasing M decreases the variance of the estimate.

3
New cards

Lecture 2-3: K-NN, Classification, Regression, and Data

True or false: If we increase N (training size) but do not change M, we'd expect the test error to be unchanged.

False. Test error should generally go down because more training samples help better fit the model.

4
New cards

Lecture 2-3: K-NN, Classification, Regression, and Data

True or false: The expected error does not depend on M, but it does depend on N.

True.

5
New cards

Lecture 2-3: K-NN, Classification, Regression, and Data

Which assumptions are implied by using Euclidean (L2) distance for K-NN?

(a, b) Each feature dimension is equally important, and feature dimensions have comparable scales.

6
New cards

Lecture 2-3: K-NN, Classification, Regression, and Data

Classify the '+' with 1-NN. ('o' or 'x'?)

'x'

7
New cards

Lecture 2-3: K-NN, Classification, Regression, and Data

Classify the '+' with 3-NN. ('o' or 'x'?)

'o'

8
New cards

Lecture 2-3: K-NN, Classification, Regression, and Data

Which of these are true of nearest neighbor? (choose all that apply)

Options: Fast inference, Fast training, Can be applied if only one sample per class is available, Not commonly used in practice, Most powerful with feature learning

(b, c, e) Fast training, Can be applied if only one sample per class, Most powerful with feature learning.

9
New cards

Lecture 4: Clustering and Retrieval

True or false: K-means assigns each point to the nearest of the established K centers.

True.

10
New cards

Lecture 4: Clustering and Retrieval

True or false: A very structured distribution of points can make K-means not converge.

False. K-means always converges.

11
New cards

Lecture 4: Clustering and Retrieval

True or false: High-dimensional data points cause K-means to iterate more times before a good clustering.

False. Number of iterations depends on #clusters and #points, not directly on dimension.

12
New cards

Lecture 4: Clustering and Retrieval

True or false: High-dimensional data increases computational cost and people often stop K-means early.

True.

13
New cards

Lecture 4: Clustering and Retrieval

True or false: K-means is deterministic but sensitive to initialization.

True.

14
New cards

Lecture 4: Clustering and Retrieval

True or false: If we don't know much about the data, people often choose K based on memory or computational limits.

True.

15
New cards

Lecture 4: Clustering and Retrieval

True or false: Clustering methods like K-means and hierarchical K-means are sensitive to local connectivity.

False. They are not sensitive to local connectivity in the way suggested.

16
New cards

Lecture 4: Clustering and Retrieval

True or false: If some attributes are more important, standard K-means can still yield a good clustering without adjustments.

False. K-means treats all features equally unless we use weighting.

17
New cards

Lecture 4: Clustering and Retrieval

True or false: One big advantage of hierarchical K-means is computational efficiency.

True.

18
New cards

Lecture 4: Clustering and Retrieval

True or false: Agglomerative clustering can be sensitive to local connectivity with a good choice of linkage.

True.

19
New cards

Lecture 4: Clustering and Retrieval

True or false: LSH idea is used where an approximate nearest neighbor is acceptable.

True.

20
New cards

Lecture 4: Clustering and Retrieval

If you have continuous-valued feature vectors and want to group them, how do clustering methods help?

Clustering assigns cluster-IDs based on similarity, making it easier to group continuous vectors by similar attributes.

21
New cards

Lecture 4: Clustering and Retrieval

For a group of pictures, what attributes could produce a good clustering?

Mixed attributes (discrete like "contains humans?" or "landscape type" and continuous like brightness or texture density) can be used.

22
New cards

Lecture 4: Clustering and Retrieval

If you used two clustering algorithms on unlabeled data, how do you compare results?

Label a subset and compute purity. Compare which clustering yields higher purity.

23
New cards

Lecture 4: Clustering and Retrieval

Which distance measure for "Each dimension same scale, dominated by large differences"? (L2, L1, Mahalanobis)

L2

24
New cards

Lecture 4: Clustering and Retrieval

Which distance measure for "Each dimension same scale, sensitive to sum of absolute differences"?

L1

25
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

True or false: In dimensionality reduction, points in lower dimension should preserve some relationship from original dimension.

True.

26
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

True or false: PCA eigenvectors can be imaginary, making PCA useless.

False. Eigenvectors of a real symmetric covariance matrix are real.

27
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

True or false: PCA eigenvectors capture discriminative features.

False. PCA captures directions of maximum variance, not necessarily discriminative features.

28
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

True or false: PCA components may have qualitative significance (e.g., eigenfaces).

True.

29
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

True or false: The largest PCA components are always most important.

False. Depends on what is considered "important" for the application.

30
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

True or false: Non-linear embedding methods focus on relationships even if reconstruction is impossible.

True.

31
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

True or false: MDS preserves pairwise distances with a user-defined metric.

True.

32
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

True or false: MDS always works even if no proper distance metric is defined.

False. If pairwise relationships don't satisfy metric properties, non-metric MDS is needed.

33
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

True or false: ISOMAP defines a unique graph structure.

False. The graph construction depends on user choices.

34
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

True or false: t-SNE minimizes KL divergence to preserve local structure.

True.

35
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

True or false: UMAP is computationally less expensive and widely used.

True.

36
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

How to choose the number of components in PCA?

Consider cumulative explained variance and choose K where adding more components adds little variance.

37
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

Why use PCA before MDS in high dimension? (2 reasons)

(1) Introduce heterogeneity in distances to help MDS find structure, (2) Reduce computational cost.

38
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

Why does MDS have an S-shape similar to the original data's shape?

MDS preserves global structure, thus retaining the original S-shape.

39
New cards

Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings

Why might t-SNE not preserve an S-shape?

t-SNE focuses on local structure, not global shape.

40
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

How do L1 and L2 regularization complement each other?

L1 induces sparsity (feature selection); L2 keeps weights small. Combined (elastic net) gives both benefits.

41
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Predict likelihood a stock is overvalued (binary) → Logistic or Linear?

Logistic Regression.

42
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Predict future earnings (continuous) → Logistic or Linear?

Linear Regression.

43
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Predict category: drastic/mild/light decrease in price → Logistic or Linear?

Logistic Regression.

44
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Someone says we can't use linear regression if data isn't linearly related. Are they right?

No. We can use transformations (e.g., polynomial features) to linearize relationships.

45
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

True or false: One hyperparameter tuning method is transforming variables so training data optimizes hyperparameters directly.

False. Hyperparameters are external parameters, not directly optimized by transforming data.

46
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

True or false: Cross-validation splits training data to measure hyperparameter performance.

True.

47
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

True or false: With sufficient data, no need for regularization.

False. Regularization can still help avoid large weights and overfitting.

48
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Define outlier and how it affects linear regression.

An outlier is a data point far from the main cluster of points. It can pull the regression line away, increasing error for most points.

49
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

SVMs are more explainable than neural nets. True or false?

True.

50
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

The dual SVM representation shows optimal parameters as a non-linear combo of examples. True or false?

False. They are a linear combination of support vectors.

51
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Training SVM involves minimizing margin for better generalization. True or false?

False. We maximize the margin.

52
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Unlike SVM, logistic regression adds non-zero penalty for all points. True or false?

True.

53
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Hinge loss increases quadratically for misclassified points. True or false?

False. Hinge loss increases linearly beyond the margin.

54
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Representation theorem: impossible to represent optimal parameter linearly with training data? True or false?

False. It states it's possible as a linear combination.

55
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Kernels in SVM enable feature mapping without explicit transformations. True or false?

True.

56
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Soft margin SVM tolerates some misclassifications. True or false?

True.

57
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Soft margin always hinders generalization. True or false?

False. Allowing a soft margin can improve generalization.

58
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

RBF SVM good when one class forms ellipsoid cluster and other outside it. True or false?

True.

59
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Removing a support vector can affect margin and boundary. True or false?

True.

60
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Why don't SVMs depend on the whole dataset? Advantages?

Only support vectors matter, providing robustness and improved generalization stability.

61
New cards

Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM

Why is the logistic regression boundary often farther from dense clusters than SVM's?

Logistic regression uses all points in the loss, pushing the boundary to reduce errors even far away, unlike SVM focusing on support vectors.

62
New cards

Lecture 8: Probability and Naive Bayes

Naive Bayes assumption with two features x1, x2: which is true?

(a) P(y|x1,x2)=P(y|x1)*P(y|x2)

(b) P(x1,x2|y)=P(x1|y)*P(x2|y)

(b)

63
New cards

Lecture 8: Probability and Naive Bayes

Which are true for Naive Bayes?

- Many features at once needed for likelihood?

- Continuous feature as Gaussian or discretized?

- NB underperforms NN due to strong assumptions?

- NB is relatively fast to train and predict?

(b, d) Continuous as Gaussian or discretized is true; NB is fast to train and predict.

64
New cards

Lecture 8: Probability and Naive Bayes

True or false: P(a|b)=P(b|a)

False.

65
New cards

Lecture 8: Probability and Naive Bayes

True or false: P(a,b)=P(b,a)

True.

66
New cards

Lecture 8: Probability and Naive Bayes

Check if a and b are independent given:

P(a=0,b=0)=0.12, P(a=0,b=1)=0.08, P(a=1,b=0)=0.48, P(a=1,b=1)=0.32

They are independent. The joint equals product of marginals.

67
New cards

Lecture 8: Probability and Naive Bayes

True or false: If x1 is independent of x2, then they are conditionally independent given y.

False.

68
New cards

Lecture 8: Probability and Naive Bayes

According to Bayes rule, P(y|x)=?

P(y|x)=P(x|y)*P(y)/P(x)

69
New cards

Lecture 8: Probability and Naive Bayes

Which transformations preserve argmax of f(x)?

Add a constant, take the log, take the exp, invert by 1/f(x)?

Add a constant, log, and exp preserve argmax. (a, c, d)

70
New cards

Lecture 8: Probability and Naive Bayes

True or false: Without a prior, it's possible P(x|y)*P(y)=0 for all y.

True.

71
New cards

Lecture 9: EM and Latent Variables

True or false: LSH random projection leads to sparse keys in high-dim data.

False.

72
New cards

Lecture 9: EM and Latent Variables

True or false: Longer hash keys in LSH can increase accuracy but slow queries.

True.

73
New cards

Lecture 9: EM and Latent Variables

True or false: Latent variables may be unobserved factors affecting data.

True.

74
New cards

Lecture 9: EM and Latent Variables

True or false: EM algorithm provides a recipe to model latent variables.

False. EM is a method for parameter estimation, modeling depends on the user.

75
New cards

Lecture 9: EM and Latent Variables

True or false: Bad annotators can be modeled as uniform noise.

True.

76
New cards

Lecture 9: EM and Latent Variables

True or false: E-step in EM estimates likelihood of observed data.

False. E-step computes expected latent variables given parameters and data.

77
New cards

Lecture 9: EM and Latent Variables

True or false: M-step in EM finds parameters that maximize likelihood given latent variable estimates.

True.

78
New cards

Lecture 9: EM and Latent Variables

True or false: M-step estimation is often weighted by latent variable likelihoods.

True.

79
New cards

Lecture 9: EM and Latent Variables

True or false: EM always converges to global maximum.

False. It converges to a local maximum.

80
New cards

Lecture 9: EM and Latent Variables

True or false: In the bad annotator problem, EM's robustness depends on how we model them.

True.

81
New cards

Lecture 9: EM and Latent Variables

True or false: K-means is an example of hard EM.

True.

82
New cards

Lecture 9: EM and Latent Variables

True or false: EM is a method for MLE with missing data.

True.

83
New cards

Lecture 9: EM and Latent Variables

True or false: EM guarantees global maximum likelihood.

False. Only local maxima are guaranteed.

84
New cards

Lecture 9: EM and Latent Variables

True or false: The E-step computes MLE of parameters given data.

False. M-step does that.

85
New cards

Lecture 9: EM and Latent Variables

True or false: The observed data likelihood increases after each EM iteration.

True.

86
New cards

Lecture 9: EM and Latent Variables

Given binary data x~Bernoulli(p), what's MLE of p with x1..xN?

p* = (Σ xi)/N

87
New cards

Lecture 10: Density Estimation (MoG, Hist, KDE)

True or false: A parametric model is by discretization (histogram).

False. Histograms are non-parametric.

88
New cards

Lecture 10: Density Estimation (MoG, Hist, KDE)

True or false: A continuous variable has probability zero at any single value.

True.

89
New cards

Lecture 10: Density Estimation (MoG, Hist, KDE)

True or false: A PDF is "smooth" if small changes near x can be approximated by value at x.

True.

90
New cards

Lecture 10: Density Estimation (MoG, Hist, KDE)

True or false: Histograms work better in higher dimensions.

False. They suffer in high dimensions (curse of dimensionality).

91
New cards

Lecture 10: Density Estimation (MoG, Hist, KDE)

True or false: Mixture of Gaussians is better when PDF is smooth.

True.

92
New cards

Lecture 10: Density Estimation (MoG, Hist, KDE)

True or false: Beta distribution can only model unimodal distributions.

False. Beta can model various shapes.

93
New cards

Lecture 10: Density Estimation (MoG, Hist, KDE)

True or false: Hyperparameters for PDF estimation (bandwidth, #components) can be chosen via cross-validation.

True.

94
New cards

Lecture 10: Density Estimation (MoG, Hist, KDE)

True or false: A common practical assumption is independence across features.

True.

95
New cards

Lecture 10: Density Estimation (MoG, Hist, KDE)

Which single-mode distribution is better approximated by a single Gaussian: left or right plot (one big mode vs two modes)?

The distribution with one main mode (the right plot).

96
New cards

Lecture 10: Density Estimation (MoG, Hist, KDE)

Which method better approximates the two-mode PDF on the left plot?

Mixture of Gaussians.

97
New cards

Lecture 10: Density Estimation (MoG, Hist, KDE)

Why might MoG be infeasible for very complex PDFs?

Complex PDFs may require many components, increasing computational cost.

98
New cards

Lecture 11: Outliers and Robust Estimation

True or false: Moving average can eliminate any additive noise.

False. Depends on noise characteristics.

99
New cards

Lecture 11: Outliers and Robust Estimation

True or false: Moving average is not robust to outliers.

True.

100
New cards

Lecture 11: Outliers and Robust Estimation

True or false: Outliers always represent incorrect values.

False. They might be correct but non-representative.