1/160
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
ML provides machines the ability to automatically learn from data while identifying patterns to make ______ with minimal human intervention.
predictions
Types of Machine Learning : Supervised, Unsupervised, Semi-Supervised, and __________.
reinforcement
Letters, Symbols, Words, Gender are samples of:
Ordinal data
Nominal data
Discrete data
Continuous data
Nominal data
The act of filling in missing values by estimation.
Imputation
Mean Imputation
Most-frequent Imputation
Column Transformation
Imputation
The ______ is an observation that goes far outside the average value of a group of statistics.
outlier
Based on the ML application table scenario, when rule gets complex and problem scale is small, ML application is:
Rule-base Algorithms
Simple Problem
ML Algorithms
Manual Rules
Manual Rules
Based on the ML application table scenario, when rule complexity is simple and problem scale is large, ML application is:
Rule-based Algorithms
Manual Rules
ML Algorithms
Simple Prolem
Rule-based Algorithms
Which data preprocessing task is the most time consuming?
Data modeling
Data analysis
Data collection
Data cleaning
Data cleaning
A continuous data is:
Quantitative
Machine Learning is a field of study concerned with giving computers the ability to ________ without being explicitly programmed.
learn
Which is not true about Machine Learning?
Enable computers to operate autonomously with explicit programming.
Machines driven by algorithms designed by humans are able to learn latent rules and inherent patterns and to fulfill tasks desired by humans.
Their maintenance is much lower than a human's and costs a lot less in the long run.
Automation by machine learning can mitigate risks caused by fatigue or inattention.
Enable computers to operate autonomously with explicit programming.
Which processes are involved in data preparation?
Data collection, Data Cleaning
Not in the options
Data Cleaning, Feature Engineering
All the given options
Splitting of dataset
All the given options
Rule-based algorithms: Condition
Machine Learning: _________.
model
Removing duplicates in the dataset is a feature engineering technique. (T or F)
False
An ordinal data is:
Qualitative
Data reduction is a feature engineering technique. (T or F)
True
Choose all the most popular Python Libraries that are used in data science.
PANDAS
NUMPY
JUPYTER
SCIPY
SQL
ANACONDA
PANDAS
NUMPY
SCIPY
Sorting out missing data is a data cleansing technique. (T or F)
True
Dataset is divided into _______ set and test set.
train
Based on the ML application table scenario, when rule complexity is complex and problem scale is large, ML application is:
Simple Problem
Rule-based Algorithms
ML Algorithms
Manual Rules
ML Algorithms
What are the two main phases of ML workflow?
Training, Modeling
Training, Testing
Learning, Prediction
Learning, Modeling
Training, Prediction
Learning, Prediction
In EDA, this process identifies unusual data points. _________
outlier detection
Reducing noise in data is a feature engineering technique. (T or F)
False
Temperature range is a sample of:
Continuous Data
Data reduction is a data cleansing technique. (T or F)
False
The ______ analysis examines relationship between variables.
correlation
Which is true about ML, AI, and DS?
DS and AI are subsets of ML
All the given options
DS and ML are subsets of AI
AI and ML are subsets of DS
AI and ML are subsets of DS
Machine Learning is a field of study concerned with giving computers the ability to ________ without being explicitly programmed.
learn
This dataset is used in the process of using the model obtained after learning for prediction.
Test Set
Training and Test Set
Model Set
Training set
Test Set
ML is a research field at the intersection of _________, artificial intelligence, and computer science.
statistics
A nominal data is:
Qualitative
Movie ratings, Military rank are samples of:
Continuous data
Discrete data
Ordinal data
Nominal data
Ordinal Data (Wrong Canvas)
The larger variety of data points your data set contains, the more complex a model you can use without overfitting. (T or F)
True
Binary Classification is a classification of dichotomous classes. (T or F)
True
The _____ allows the models to make informed predictions even when faced with previously unseen data.
Underfitting
Generalization
Overfitting
Generalization
Supervised algorithms address classification problems where the output variable is categorical. (T or F)
False
The ______ refers to the error resulting from sensitivity to the noise in the training data.
variance
The more complex we allow our model to be, the better we will be able to predict on the training data. (T or F)
True
SVM is an example of regression algorithm. (T or F)
False
In k-NN, High Model Complexity is overfitting. (T or F)
True
In k-NN, when you choose a small value of k (e.g., k=1), the model becomes less complex. (T or F)
False
The ‘k’ in k-Nearest neighbors refers to an arbitrary number of neighbors. (T or F)
True
In k-NN, voting means for each test point, we count how many neighbors belong to a class e.g. how many belong to class 0 and how many neighbors belong to class 1. (T or F)
True
In the estimation of regression model, predicting worse than the average can result in negative numbers. (T or F)
True
When comparing training set and test set scores, we find that we predict very accurately on the training set, but the R2 on the test set is much worse. This is a sign of overfitting. (T or F)
True
Lasso uses L2 Regularization. (T or F)
False
What is the full form of OLS?
Ordinary Least Squares
Regularization means explicitly restricting a model to avoid overfitting. (T or F)
True
Ridge is generally preferred over Lasso, but if you want a model that is easy to analyze and understand then use Ridge. (T or F)
False
In Ridge regression is α (alpha) is larger, the penalty becomes larger. (T or F)
True
Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. (T or F)
True
Naïve Bayes classifier that deals with continuous data.
All the given options
GaussianNB
MultinomialNB
BernoulliNB
GaussianNB
Its target is a categorical variable.
Correlation
Supervised Learning
Regression
Classification
Classification
Regression predicts consecutive numbers. (T or F)
True
In k-NN, Low Model Complexity is underfitting. (T or F)
True
In k-NN, Low Model Complexity is overfitting. (T or F)
False
The ‘offset’ parameter is also called intercept. (T or F)
True
Types of Linear Models : Linear Regression, ____________.
Logistic Regression
The ‘slope’ parameter is also called weights or coefficients. (T or F)
True
Naïve Bayes classifier that deals with integer count data.
MultinomialNB
All the given options
BernoulliNB
GaussianNB
MultinomialNB
A model which does not capture the underlying relationship in the dataset on which it's trained.
Generalization
Overfitting
Underfitting
Underfitting
A model is able to make accurate predictions on new, unseen data.
Underfitting
Overfitting
Generalization
Generalization
When using multiple nearest neighbors, the prediction is the mean of the relevant neighbors. (T or F)
True
The ‘offset’ parameter is also called _______.
Intercept
Slope
Weights
Mean
Intercept
In Ridge regression is α (alpha) is larger, the penalty becomes lesser. (T or F)
False
Naïve Bayes classifier that deals with integer binary data.
BernoulliNB
GaussianNB
d. All the given options
MultinomialNB
BernoulliNB
Naïve Bayes learns parameters by looking at each feature individually and collects simple per-class statistics from each feature. (T or F)
True
Dimensionality reduction techniques help improve model interpretability and performance.
True
False
True
Unsupervised learning can be used to discover hidden patterns in data.
True
False
True
In Apriori algorithm, the k data points are randomly selected from the data set as the initial cluster centroid.
True
False
False
The training model of this ML consists only of input parameter values and discovers the groups or patterns on its own.
Choices:
Unsupervised
Supervised
Unsupervised
In Apriori algorithm, we define first the ______.
Choices:
Size of itemset
Confidence
Support
Frequent itemset
Size of itemset
An Apriori algorithm is a data mining technique for learning correlations and relations among variables in a database.
Choices:
True
False
True
Apriori algorithm is used in a ________ technique.
Choices:
Classification
Association
Regression
Clustering
Association
In K-means, the selected k number of clusters will be the initial clusters.
Choices:
True
False
True
In K-means, each iteration calculates new ______ of datapoints.
Choices:
Median
Cluster
Centroid
Mean
Mean
Support is the probability that if a person buys an item A, then he will also buy an item B.
Choices:
True
False
False
The formula for relative support is:
A. Total number of transactions / Total number of transactions containing an itemset X
B. Total number of transactions containing an itemset X / Total number of transactions
C. Total number of itemset / Total number of transactions
D. Total number of transactions / Total number of itemset
B
In unsupervised learning, the algorithm divides the data objects into groups according to the similarities and differences between the objects.
Choices:
False
True
True
An unsupervised learning that extracts important features from the dataset, reducing the number of irrelevant or random features present.
Choices:
Apriori
All the options
K-Means
Dimensionality Reduction
Dimensionality Reduction
In K-means, each datapoint compute distance between the datapoint and the cluster centroid.
Choices:
False
True
True
K-means algorithm is used in a ________ technique.
Choices:
Classification
Clustering
Regression
Association
Clustering
In K-means, each cluster calculates the new median based on the datapoints in the cluster.
Choices:
True
False
False
An itemset that meets the support is called a frequent itemset.
Choices:
False
True
True
Which algorithm is commonly used for clustering in unsupervised learning?
Choices:
Linear Regression
Decision Tree
Naive Bayes
K-Means
K-Means
Which algorithm is commonly used for association rule mining?
Choices:
PCA
K-Means
DBSCAN
Apriori
Apriori
Principal Component Analysis (PCA) is primarily used for:
Choices:
Clustering
Classification
Regression
Dimensionality Reduction
Dimensionality Reduction
Which of the following is NOT an application of unsupervised learning?
Choices:
Image classification with labeled data
Market basket analysis
Customer segmentation
Fraud detection
Image classification with labeled data
Which of the following is a key characteristic of unsupervised learning?
Choices:
Supervised feedback
No labeled data
Predefined output classes
Labeled data
No labeled data
In K-Means clustering, the value of 'K' represents:
Choices:
The number of clusters
The number of iterations
The number of features
The number of data points
The number of clusters
Unsupervised learning algorithms can be evaluated using accuracy scores.
Choices:
True
False
False
K-Means clustering always produces the same result regardless of initial centroid selection.
Choices:
True
False
False
Lift measures how much more likely two items are to occur together than if they were independent.
Choices:
False
True
True
Association rule mining is used to find relationships between variables in large datasets.
Choices:
False
True
True
Which clustering algorithm is density-based and can find clusters of arbitrary shape?
DBSCAN
K-Means
Hierarchical
Apriori
DBSCAN
DBSCAN is a clustering algorithm that can find clusters of arbitrary shape.
True
False
True
Which metric is commonly used to evaluate association rules?
Support, Confidence, and Lift
Accuracy
Precision and Recall
Mean Squared Error
Support, Confidence, and Lift
Hierarchical clustering requires the number of clusters to be specified in advance.
True
False
False
Rule-based algorithms: Condition while Machine Learning: _________.
Learning
Prediction
Algorithms
Model
Model
ML provides machines the ability to automatically predict from data while identifying patterns to learn with minimal human intervention.
True
False
False