1/16
These flashcards cover key concepts, definitions, and true/false statements from the lecture notes on machine learning techniques, including frequent itemset mining, dimensionality reduction, clustering, decision trees, and support vector machines.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
True or False: Increasing the minimum support threshold in the Apriori algorithm always results in a smaller number of frequent itemsets being discovered.
True
True or False: The main objective of t-SNE is to reduce the dimensionality of the data while preserving global structures.
False
True or False: DBSCAN can identify clusters of arbitrary shape and can also detect outliers.
True
True or False: Both Lasso and Ridge regression are techniques used to prevent overfitting by adding a penalty term to the cost function.
True
True or False: The linear kernel is suitable for handling non-linearly separable data in SVM.
False
True or False: A classifier with high precision and low recall is ideal for scenarios where false negatives are more harmful than false positives.
False
True or False: In reinforcement learning, an agent always receives immediate rewards after taking an action.
False
True or False: A single-layer perceptron can accurately classify non-linearly separable data such as the XOR problem.
False
What is the difference between closed itemsets and maximal itemsets in frequent itemset mining?
Closed itemsets are those that have no superset with the same support, while maximal itemsets are those that are not a subset of any other frequent itemset.
What does a confidence value of 0.7 in an association rule {Item A} → {Item C} indicate?
It indicates that the probability of Items A and C appearing together is 0.7, given that Item A is present.
What is the curse of dimensionality?
It refers to the phenomenon where the feature space becomes sparse with high-dimensional data, making it difficult for models to determine relationships between points.
What is the primary goal of PCA in dimensionality reduction?
The primary goal of PCA is to retain as much variance in the data as possible while reducing the number of dimensions.
What are the common metrics used to evaluate impurity in decision trees?
Common metrics include information gain, which measures the effectiveness of an attribute in classifying data, and Gini impurity, which calculates the likelihood of misclassifying a randomly chosen element.
What distinguishes K-Means clustering from hierarchical clustering?
K-Means clustering assigns data points into a specified number of clusters based on centroids, while hierarchical clustering builds a hierarchy of clusters through a linkage function.
What are the implications of high precision in a spam detection system?
High precision means most flagged emails (predicted spam) are indeed spam, reducing false positives and improving user experience.
What is the purpose of using different kernels in SVMs?
Different kernels allow SVMs to classify data in higher dimensions where it can become linearly separable, enabling better generalization.
Why can a single-layer perceptron model AND and OR operations but not the XOR operation?
A single-layer perceptron can only create linear decision boundaries, which are sufficient for AND and OR but not for the non-linear boundary required for XOR.