1/63
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
In a supervised learning task, when dealing with a data set exhibiting a normal distribution — which is a symmetrical distribution where most observations cluster around the central peak and probabilities for values further from the mean taper off equally in both directions — removing outliers specifically from the features always results in a significant reduction of noise and reliably improves the accuracy of the data analysis models.
False
In hierarchical clustering, the algorithm builds a hierarchy of clusters either by a divisive method, which starts with all observations in a single cluster and divides them into smaller clusters, or by an agglomerative method, which starts with each observation as its own cluster and merges them into larger clusters.
True
In feature engineering, scaling and centering are techniques that modify the range of dependent variables in the data, and these methods are generally not necessary for algorithms sensitive to feature scales, such as gradient descent-based algorithms, k-means clustering, and support vector machines.
False
Variance in a machine learning model refers to the error introduced by excessive simplicity in the learning algorithm, which often results in a model that underestimates the complexity of the underlying data distribution, leading to high error rates on both training and unseen data.
False
In data modeling, feature engineering means choosing a subset of available features, and feature selection means creating or transforming features, both primarily aimed at reducing the risk of overfitting and accelerating model training.
False
You have a very large dataset and need to make real-time predictions, where computational efficiency and speed are critical. For real-time predictions with a large dataset, which algorithm is more appropriate?
Deep Neural Network
You have a data set with a clear margin of separation between classes, but it’s not linearly separable. For this dataset with a non-linear separation between classes, which algorithm below should you choose?
SVM with a non-linear kernel
You need to classify text documents into different categories based on their content. For classifying text documents, which algorithm below would be most suitable?
Naive Bayes
For clustering a dataset with varying cluster shapes, densities, unknown number of clusters, and the presence of outliers, which algorithm below would be most suitable?
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
Your dataset includes a mix of numeric and categorical variables, and the relationships between features and the outcome are non-linear. Which algorithm below should you use for this dataset?
Random Forest
In the context of the K-Nearest Neighbors (K-NN) algorithm, consider the following scenario: You are working on a classification problem using a K-NN model. Your dataset is relatively large and consists of several numerical features. As part of fine-tuning the model's performance, you are considering various approaches, such as hyperparameter tuning, feature engineering, data preprocessing, and adjustments to the model's training process.
Normalizing the feature scales to ensure that all features contribute equally to the distance calculations.
Consider a scenario where you are using a Decision Tree algorithm for a classification task. The dataset has a mix of categorical and numerical features, and the target variable is binary. You are in the process of optimizing the Decision Tree to improve its accuracy and prevent overfitting.
Which of the following techniques is an effective method for preventing overfitting in a Decision Tree classifier?
Pruning the tree by setting a maximum depth or minimum samples per leaf.
You want to cluster data that has noise.
DBSCAN
You know how many clusters you want to end up with.
K-means
You know the clusters have odd shapes.
DBSCAN
You want a complete clustering of the data.
K-Means
You want to minimize the sum squared error (SSE) of the clusters.
K-Means
Soft margin SVM allows for some misclassifications to achieve larger margin and better generalization on the training data.
True
Ensemble methods like boosting focuses more on difficult to classify instances.
True
KNN performs the classification decision based solely on a majority vote of the nearest neighbors regardless of their distances.
False
K-means clustering algorithm guarantees convergence to the global optimum.
False
K-means++ is an algorithm that guarantees better clustering than K-means by optimizing the initial placement of centroids.
False
Density-based clustering methods like DBSCAN can identify clusters of any shape and are particularly good at separating high-density clusters from low-density areas.
True
DBSCAN requires the number of clusters to be specified in advance.
False
Hierarchical clustering can only use a single metric for measuring the distances between clusters throughout the entire clustering process.
False
In the Apriori algorithm, all subsets of a frequent itemset must also be frequent.
True
Association rules that have high confidence necessarily have high lift.
True
The lift value of an association rule that is less than 1 indicates that the items in the rule are negatively associated.
False
Support is a measure of how frequently the items in an association rule appear together in all transactions.
True
A Naïve Bayes classifier requires a larger amount of data to perform well compared to more complex models because it assumes that all features are independent given the class label.
False
The leverage of an association rule X → Y measures the difference between the observed frequency of X and Y appearing together and the frequency expected if X and Y were statistically independent.
True
In association rule analysis, if an itemset has a high support, any rule derived from this itemset will also have high support.
False
You are working on a binary classification problem using a Support Vector Machine (SVM). The dataset involves features that are not linearly separable in the current feature space. Which of the following strategies would be most effective for improving the classification accuracy of the SVM?
Utilize a polynomial kernel to implicitly transform the features into a higher-dimensional space where the classes are more likely to be linearly separable.
When training a Support Vector Machine (SVM) for a classification task, the concept of a margin is crucial for understanding how the model discriminates between classes. Which of the following best describes the role of maximizing the margin and the use of slack variables in SVM?
Maximizing the margin involves creating the largest possible distance between the decision boundary and the nearest data points from each class, thereby enhancing the model’s robustness and generalization capabilities. Slack variables permit some degree of misclassification, particularly for data that is not linearly separable, by allowing flexibility in the margin constraints to achieve a broader margin.
When applying a Support Vector Machine (SVM) in a binary classification task, various factors influence the performance and applicability of the model. Suppose you are comparing SVM to other classifiers on a dataset with imbalanced classes and a moderate amount of noise. Which of the following statements most accurately reflects the considerations and adaptations you might need for effective SVM deployment?
Adjust the class weights in SVM to give more importance to the minority class, helping to offset the class imbalance during model training.
True Positive (TP)
The classifier correctly identifies a network intrusion; an actual intrusion attempt is detected by the system.
False Positive (FP)
The classifier incorrectly flags normal network traffic as an intrusion; benign activity is mistakenly identified as malicious.
True Negative (TN)
The classifier correctly recognizes normal network traffic; benign activity is correctly identified as non-malicious.
False Negative (FN)
The classifier fails to detect an actual intrusion; a malicious activity goes unnoticed by the system.
A significant drawback of using the Train/Validate/Test split is that a considerable part of the dataset may need to be reserved for testing.
True
The K models created during the K-fold cross-validation process should be used for making predictions on new data.
False
Decision Trees are considered “Lazy Learners” because they do not generalize training data until it is needed to classify test examples.
False
Calculating class-conditional probability with Naive Bayes involves finding the product of the probabilities of observing each feature given the class label.
True
A high confidence value for an association rule always implies a strong and meaningful relationship between the items.
False
Feature scaling is essential for algorithms that compute distances between data points, such as K-Nearest Neighbors.
True
In classification tasks, accuracy is always the best metric to evaluate model performance.
False
In association rule mining, the measure that indicates the proportion of transactions containing one itemset relative to another is called leverage.
False
When using K-means clustering, the final result is independent of the initial centroid positions.
False
Neural networks with no hidden layers can only learn linear decision boundaries.
True
Apriori principle states that all subsets of an infrequent itemset must also be infrequent.
False
In ensemble methods like bagging, the individual models are trained on the same dataset to ensure consistency in their predictions.
False
The Lift metric in association rule mining measures how much more often the antecedent and consequent occur together than expected if they were statistically independent.
True
The elbow method in K-Means clustering involves plotting the explained variance as a function of the number of clusters and looking for a point where the rate of variance reduction sharply decreases.
True
Isolation Forest detects anomalies based on the assumption that anomalous points are easier to isolate than normal points.
True
Single-linkage hierarchical clustering tends to create more chain-like clusters ( i.e. clusters that are stretched out and less compact in shape) compared to complete-linkage clustering.
True
Feature selection aims to reduce the number of features by creating new combinations of existing features, whereas feature extraction removes irrelevant or redundant features.
False
A frequent itemset is an itemset whose support is equal or greater than some minsup threshold.
True
In hierarchical clustering, the dendrogram can be cut at different levels to obtain different numbers of clusters, allowing flexibility in choosing the number of clusters after the clustering process is complete.
True
The K-Means algorithm always converges to the global optimum solution.
False
You are using DBSCAN for clustering data with varying densities. You notice that some clusters are not being identified correctly. Which parameter should you adjust to improve clustering in such cases?
The epsilon (ε) parameter (maximum neighborhood radius)
Which linkage method in hierarchical clustering considers the maximum distance between elements of two clusters when merging them?
Complete linkage
Which of the following is a key advantage of DBSCAN over K-Means clustering?
It can identify clusters of arbitrary shapes and detect noise.
Which assumption is fundamental to the Naïve Bayes classifier?
All features are independent given the class label.
A manufacturing company employs a network of sensors to monitor the operational status of its machinery in real time. The sensor data includes temperature, vibration, pressure, and other operational metrics collected at high frequency, resulting in thousands of features. Unexpected equipment failures can be costly, so the company aims to detect anomalies that may indicate impending malfunctions. Key requirements are:
1. Must process streaming data efficiently
2. Must scale well with high-dimensional data (thousands of features)
3. Must adapt to evolving normal patterns over time
4. Must handle unlabeled data
5. Must detect rare anomalies without prior knowledge of anomaly patterns
Which of the following algorithms is most appropriate for detecting anomalies in this scenario, and why?
Isolation Forest, because it efficiently isolates anomalies in high-dimensional data using random partitioning, and it is well-suited for streaming data.