1/237
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Lecture 2-3: K-NN, Classification, Regression, and Data
True or false: With different sets of M test samples, we would probably get the same error measurement.
False. There would be some variance in the error measurement.
Lecture 2-3: K-NN, Classification, Regression, and Data
True or false: If we increase M, we should get a more accurate (lower variance) estimate of the error.
True. Increasing M decreases the variance of the estimate.
Lecture 2-3: K-NN, Classification, Regression, and Data
True or false: If we increase N (training size) but do not change M, we'd expect the test error to be unchanged.
False. Test error should generally go down because more training samples help better fit the model.
Lecture 2-3: K-NN, Classification, Regression, and Data
True or false: The expected error does not depend on M, but it does depend on N.
True.
Lecture 2-3: K-NN, Classification, Regression, and Data
Which assumptions are implied by using Euclidean (L2) distance for K-NN?
(a, b) Each feature dimension is equally important, and feature dimensions have comparable scales.
Lecture 2-3: K-NN, Classification, Regression, and Data
Classify the '+' with 1-NN. ('o' or 'x'?)
'x'
Lecture 2-3: K-NN, Classification, Regression, and Data
Classify the '+' with 3-NN. ('o' or 'x'?)
'o'
Lecture 2-3: K-NN, Classification, Regression, and Data
Which of these are true of nearest neighbor? (choose all that apply)
Options: Fast inference, Fast training, Can be applied if only one sample per class is available, Not commonly used in practice, Most powerful with feature learning
(b, c, e) Fast training, Can be applied if only one sample per class, Most powerful with feature learning.
Lecture 4: Clustering and Retrieval
True or false: K-means assigns each point to the nearest of the established K centers.
True.
Lecture 4: Clustering and Retrieval
True or false: A very structured distribution of points can make K-means not converge.
False. K-means always converges.
Lecture 4: Clustering and Retrieval
True or false: High-dimensional data points cause K-means to iterate more times before a good clustering.
False. Number of iterations depends on #clusters and #points, not directly on dimension.
Lecture 4: Clustering and Retrieval
True or false: High-dimensional data increases computational cost and people often stop K-means early.
True.
Lecture 4: Clustering and Retrieval
True or false: K-means is deterministic but sensitive to initialization.
True.
Lecture 4: Clustering and Retrieval
True or false: If we don't know much about the data, people often choose K based on memory or computational limits.
True.
Lecture 4: Clustering and Retrieval
True or false: Clustering methods like K-means and hierarchical K-means are sensitive to local connectivity.
False. They are not sensitive to local connectivity in the way suggested.
Lecture 4: Clustering and Retrieval
True or false: If some attributes are more important, standard K-means can still yield a good clustering without adjustments.
False. K-means treats all features equally unless we use weighting.
Lecture 4: Clustering and Retrieval
True or false: One big advantage of hierarchical K-means is computational efficiency.
True.
Lecture 4: Clustering and Retrieval
True or false: Agglomerative clustering can be sensitive to local connectivity with a good choice of linkage.
True.
Lecture 4: Clustering and Retrieval
True or false: LSH idea is used where an approximate nearest neighbor is acceptable.
True.
Lecture 4: Clustering and Retrieval
If you have continuous-valued feature vectors and want to group them, how do clustering methods help?
Clustering assigns cluster-IDs based on similarity, making it easier to group continuous vectors by similar attributes.
Lecture 4: Clustering and Retrieval
For a group of pictures, what attributes could produce a good clustering?
Mixed attributes (discrete like "contains humans?" or "landscape type" and continuous like brightness or texture density) can be used.
Lecture 4: Clustering and Retrieval
If you used two clustering algorithms on unlabeled data, how do you compare results?
Label a subset and compute purity. Compare which clustering yields higher purity.
Lecture 4: Clustering and Retrieval
Which distance measure for "Each dimension same scale, dominated by large differences"? (L2, L1, Mahalanobis)
L2
Lecture 4: Clustering and Retrieval
Which distance measure for "Each dimension same scale, sensitive to sum of absolute differences"?
L1
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
True or false: In dimensionality reduction, points in lower dimension should preserve some relationship from original dimension.
True.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
True or false: PCA eigenvectors can be imaginary, making PCA useless.
False. Eigenvectors of a real symmetric covariance matrix are real.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
True or false: PCA eigenvectors capture discriminative features.
False. PCA captures directions of maximum variance, not necessarily discriminative features.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
True or false: PCA components may have qualitative significance (e.g., eigenfaces).
True.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
True or false: The largest PCA components are always most important.
False. Depends on what is considered "important" for the application.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
True or false: Non-linear embedding methods focus on relationships even if reconstruction is impossible.
True.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
True or false: MDS preserves pairwise distances with a user-defined metric.
True.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
True or false: MDS always works even if no proper distance metric is defined.
False. If pairwise relationships don't satisfy metric properties, non-metric MDS is needed.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
True or false: ISOMAP defines a unique graph structure.
False. The graph construction depends on user choices.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
True or false: t-SNE minimizes KL divergence to preserve local structure.
True.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
True or false: UMAP is computationally less expensive and widely used.
True.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
How to choose the number of components in PCA?
Consider cumulative explained variance and choose K where adding more components adds little variance.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
Why use PCA before MDS in high dimension? (2 reasons)
(1) Introduce heterogeneity in distances to help MDS find structure, (2) Reduce computational cost.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
Why does MDS have an S-shape similar to the original data's shape?
MDS preserves global structure, thus retaining the original S-shape.
Lecture 5: Dimensionality Reduction: PCA and Low-D Embeddings
Why might t-SNE not preserve an S-shape?
t-SNE focuses on local structure, not global shape.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
How do L1 and L2 regularization complement each other?
L1 induces sparsity (feature selection); L2 keeps weights small. Combined (elastic net) gives both benefits.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Predict likelihood a stock is overvalued (binary) → Logistic or Linear?
Logistic Regression.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Predict future earnings (continuous) → Logistic or Linear?
Linear Regression.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Predict category: drastic/mild/light decrease in price → Logistic or Linear?
Logistic Regression.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Someone says we can't use linear regression if data isn't linearly related. Are they right?
No. We can use transformations (e.g., polynomial features) to linearize relationships.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
True or false: One hyperparameter tuning method is transforming variables so training data optimizes hyperparameters directly.
False. Hyperparameters are external parameters, not directly optimized by transforming data.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
True or false: Cross-validation splits training data to measure hyperparameter performance.
True.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
True or false: With sufficient data, no need for regularization.
False. Regularization can still help avoid large weights and overfitting.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Define outlier and how it affects linear regression.
An outlier is a data point far from the main cluster of points. It can pull the regression line away, increasing error for most points.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
SVMs are more explainable than neural nets. True or false?
True.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
The dual SVM representation shows optimal parameters as a non-linear combo of examples. True or false?
False. They are a linear combination of support vectors.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Training SVM involves minimizing margin for better generalization. True or false?
False. We maximize the margin.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Unlike SVM, logistic regression adds non-zero penalty for all points. True or false?
True.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Hinge loss increases quadratically for misclassified points. True or false?
False. Hinge loss increases linearly beyond the margin.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Representation theorem: impossible to represent optimal parameter linearly with training data? True or false?
False. It states it's possible as a linear combination.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Kernels in SVM enable feature mapping without explicit transformations. True or false?
True.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Soft margin SVM tolerates some misclassifications. True or false?
True.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Soft margin always hinders generalization. True or false?
False. Allowing a soft margin can improve generalization.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
RBF SVM good when one class forms ellipsoid cluster and other outside it. True or false?
True.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Removing a support vector can affect margin and boundary. True or false?
True.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Why don't SVMs depend on the whole dataset? Advantages?
Only support vectors matter, providing robustness and improved generalization stability.
Lecture 6 & 7: Linear Regression, Regularization, Logistic Regression, SVM
Why is the logistic regression boundary often farther from dense clusters than SVM's?
Logistic regression uses all points in the loss, pushing the boundary to reduce errors even far away, unlike SVM focusing on support vectors.
Lecture 8: Probability and Naive Bayes
Naive Bayes assumption with two features x1, x2: which is true?
(a) P(y|x1,x2)=P(y|x1)*P(y|x2)
(b) P(x1,x2|y)=P(x1|y)*P(x2|y)
(b)
Lecture 8: Probability and Naive Bayes
Which are true for Naive Bayes?
- Many features at once needed for likelihood?
- Continuous feature as Gaussian or discretized?
- NB underperforms NN due to strong assumptions?
- NB is relatively fast to train and predict?
(b, d) Continuous as Gaussian or discretized is true; NB is fast to train and predict.
Lecture 8: Probability and Naive Bayes
True or false: P(a|b)=P(b|a)
False.
Lecture 8: Probability and Naive Bayes
True or false: P(a,b)=P(b,a)
True.
Lecture 8: Probability and Naive Bayes
Check if a and b are independent given:
P(a=0,b=0)=0.12, P(a=0,b=1)=0.08, P(a=1,b=0)=0.48, P(a=1,b=1)=0.32
They are independent. The joint equals product of marginals.
Lecture 8: Probability and Naive Bayes
True or false: If x1 is independent of x2, then they are conditionally independent given y.
False.
Lecture 8: Probability and Naive Bayes
According to Bayes rule, P(y|x)=?
P(y|x)=P(x|y)*P(y)/P(x)
Lecture 8: Probability and Naive Bayes
Which transformations preserve argmax of f(x)?
Add a constant, take the log, take the exp, invert by 1/f(x)?
Add a constant, log, and exp preserve argmax. (a, c, d)
Lecture 8: Probability and Naive Bayes
True or false: Without a prior, it's possible P(x|y)*P(y)=0 for all y.
True.
Lecture 9: EM and Latent Variables
True or false: LSH random projection leads to sparse keys in high-dim data.
False.
Lecture 9: EM and Latent Variables
True or false: Longer hash keys in LSH can increase accuracy but slow queries.
True.
Lecture 9: EM and Latent Variables
True or false: Latent variables may be unobserved factors affecting data.
True.
Lecture 9: EM and Latent Variables
True or false: EM algorithm provides a recipe to model latent variables.
False. EM is a method for parameter estimation, modeling depends on the user.
Lecture 9: EM and Latent Variables
True or false: Bad annotators can be modeled as uniform noise.
True.
Lecture 9: EM and Latent Variables
True or false: E-step in EM estimates likelihood of observed data.
False. E-step computes expected latent variables given parameters and data.
Lecture 9: EM and Latent Variables
True or false: M-step in EM finds parameters that maximize likelihood given latent variable estimates.
True.
Lecture 9: EM and Latent Variables
True or false: M-step estimation is often weighted by latent variable likelihoods.
True.
Lecture 9: EM and Latent Variables
True or false: EM always converges to global maximum.
False. It converges to a local maximum.
Lecture 9: EM and Latent Variables
True or false: In the bad annotator problem, EM's robustness depends on how we model them.
True.
Lecture 9: EM and Latent Variables
True or false: K-means is an example of hard EM.
True.
Lecture 9: EM and Latent Variables
True or false: EM is a method for MLE with missing data.
True.
Lecture 9: EM and Latent Variables
True or false: EM guarantees global maximum likelihood.
False. Only local maxima are guaranteed.
Lecture 9: EM and Latent Variables
True or false: The E-step computes MLE of parameters given data.
False. M-step does that.
Lecture 9: EM and Latent Variables
True or false: The observed data likelihood increases after each EM iteration.
True.
Lecture 9: EM and Latent Variables
Given binary data x~Bernoulli(p), what's MLE of p with x1..xN?
p* = (Σ xi)/N
Lecture 10: Density Estimation (MoG, Hist, KDE)
True or false: A parametric model is by discretization (histogram).
False. Histograms are non-parametric.
Lecture 10: Density Estimation (MoG, Hist, KDE)
True or false: A continuous variable has probability zero at any single value.
True.
Lecture 10: Density Estimation (MoG, Hist, KDE)
True or false: A PDF is "smooth" if small changes near x can be approximated by value at x.
True.
Lecture 10: Density Estimation (MoG, Hist, KDE)
True or false: Histograms work better in higher dimensions.
False. They suffer in high dimensions (curse of dimensionality).
Lecture 10: Density Estimation (MoG, Hist, KDE)
True or false: Mixture of Gaussians is better when PDF is smooth.
True.
Lecture 10: Density Estimation (MoG, Hist, KDE)
True or false: Beta distribution can only model unimodal distributions.
False. Beta can model various shapes.
Lecture 10: Density Estimation (MoG, Hist, KDE)
True or false: Hyperparameters for PDF estimation (bandwidth, #components) can be chosen via cross-validation.
True.
Lecture 10: Density Estimation (MoG, Hist, KDE)
True or false: A common practical assumption is independence across features.
True.
Lecture 10: Density Estimation (MoG, Hist, KDE)
Which single-mode distribution is better approximated by a single Gaussian: left or right plot (one big mode vs two modes)?
The distribution with one main mode (the right plot).
Lecture 10: Density Estimation (MoG, Hist, KDE)
Which method better approximates the two-mode PDF on the left plot?
Mixture of Gaussians.
Lecture 10: Density Estimation (MoG, Hist, KDE)
Why might MoG be infeasible for very complex PDFs?
Complex PDFs may require many components, increasing computational cost.
Lecture 11: Outliers and Robust Estimation
True or false: Moving average can eliminate any additive noise.
False. Depends on noise characteristics.
Lecture 11: Outliers and Robust Estimation
True or false: Moving average is not robust to outliers.
True.
Lecture 11: Outliers and Robust Estimation
True or false: Outliers always represent incorrect values.
False. They might be correct but non-representative.