Machine Learning Compilation

0.0(0)

Studied by 24 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/160

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

161 Terms

New cards

ML provides machines the ability to automatically learn from data while identifying patterns to make ______ with minimal human intervention.

predictions

New cards

Types of Machine Learning : Supervised, Unsupervised, Semi-Supervised, and __________.

reinforcement

New cards

Letters, Symbols, Words, Gender are samples of:

Ordinal data

Nominal data

Discrete data

Continuous data

Nominal data

New cards

The act of filling in missing values by estimation.

Imputation

Mean Imputation

Most-frequent Imputation

Column Transformation

Imputation

New cards

The ______ is an observation that goes far outside the average value of a group of statistics.

outlier

New cards

Based on the ML application table scenario, when rule gets complex and problem scale is small, ML application is:

Rule-base Algorithms

Simple Problem

ML Algorithms

Manual Rules

New cards

Based on the ML application table scenario, when rule complexity is simple and problem scale is large, ML application is:

Rule-based Algorithms

Manual Rules

ML Algorithms

Simple Prolem

Rule-based Algorithms

New cards

Which data preprocessing task is the most time consuming?

Data modeling

Data analysis

Data collection

Data cleaning

New cards

A continuous data is:

Quantitative

New cards

Machine Learning is a field of study concerned with giving computers the ability to ________ without being explicitly programmed.

learn

New cards

Which is not true about Machine Learning?

Enable computers to operate autonomously with explicit programming.

Machines driven by algorithms designed by humans are able to learn latent rules and inherent patterns and to fulfill tasks desired by humans.

Their maintenance is much lower than a human's and costs a lot less in the long run.

Automation by machine learning can mitigate risks caused by fatigue or inattention.

Enable computers to operate autonomously with explicit programming.

New cards

Which processes are involved in data preparation?

Data collection, Data Cleaning

Not in the options

Data Cleaning, Feature Engineering

All the given options

Splitting of dataset

All the given options

New cards

Rule-based algorithms: Condition

Machine Learning: _________.

model

New cards

Removing duplicates in the dataset is a feature engineering technique. (T or F)

False

New cards

An ordinal data is:

Qualitative

New cards

Data reduction is a feature engineering technique. (T or F)

True

New cards

Choose all the most popular Python Libraries that are used in data science.

PANDAS

NUMPY

JUPYTER

SCIPY

SQL

ANACONDA

PANDAS

NUMPY

SCIPY

New cards

Sorting out missing data is a data cleansing technique. (T or F)

True

New cards

Dataset is divided into _______ set and test set.

train

New cards

Based on the ML application table scenario, when rule complexity is complex and problem scale is large, ML application is:

Simple Problem

Rule-based Algorithms

ML Algorithms

Manual Rules

ML Algorithms

New cards

What are the two main phases of ML workflow?

Training, Modeling

Training, Testing

Learning, Prediction

Learning, Modeling

Training, Prediction

Learning, Prediction

New cards

In EDA, this process identifies unusual data points. _________

outlier detection

New cards

Reducing noise in data is a feature engineering technique. (T or F)

False

New cards

Temperature range is a sample of:

Continuous Data

New cards

Data reduction is a data cleansing technique. (T or F)

False

New cards

The ______ analysis examines relationship between variables.

correlation

New cards

Which is true about ML, AI, and DS?

DS and AI are subsets of ML

All the given options

DS and ML are subsets of AI

AI and ML are subsets of DS

New cards

Machine Learning is a field of study concerned with giving computers the ability to ________ without being explicitly programmed.

learn

New cards

This dataset is used in the process of using the model obtained after learning for prediction.

Test Set

Training and Test Set

Model Set

Training set

Test Set

New cards

ML is a research field at the intersection of _________, artificial intelligence, and computer science.

statistics

New cards

A nominal data is:

Qualitative

New cards

Movie ratings, Military rank are samples of:

Continuous data

Discrete data

Ordinal data

Nominal data

Ordinal Data (Wrong Canvas)

New cards

The larger variety of data points your data set contains, the more complex a model you can use without overfitting. (T or F)

True

New cards

Binary Classification is a classification of dichotomous classes. (T or F)

True

New cards

The _____ allows the models to make informed predictions even when faced with previously unseen data.

Underfitting

Generalization

Overfitting

Generalization

New cards

Supervised algorithms address classification problems where the output variable is categorical. (T or F)

False

New cards

The ______ refers to the error resulting from sensitivity to the noise in the training data.

variance

New cards

The more complex we allow our model to be, the better we will be able to predict on the training data. (T or F)

True

New cards

SVM is an example of regression algorithm. (T or F)

False

New cards

In k-NN, High Model Complexity is overfitting. (T or F)

True

New cards

In k-NN, when you choose a small value of k (e.g., k=1), the model becomes less complex. (T or F)

False

New cards

The ‘k’ in k-Nearest neighbors refers to an arbitrary number of neighbors. (T or F)

True

New cards

In k-NN, voting means for each test point, we count how many neighbors belong to a class e.g. how many belong to class 0 and how many neighbors belong to class 1. (T or F)

True

New cards

In the estimation of regression model, predicting worse than the average can result in negative numbers. (T or F)

True

New cards

When comparing training set and test set scores, we find that we predict very accurately on the training set, but the R2 on the test set is much worse. This is a sign of overfitting. (T or F)

True

New cards

Lasso uses L2 Regularization. (T or F)

False

New cards

What is the full form of OLS?

Ordinary Least Squares

New cards

Regularization means explicitly restricting a model to avoid overfitting. (T or F)

True

New cards

Ridge is generally preferred over Lasso, but if you want a model that is easy to analyze and understand then use Ridge. (T or F)

False

New cards

In Ridge regression is α (alpha) is larger, the penalty becomes larger. (T or F)

True

New cards

Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. (T or F)

True

New cards

Naïve Bayes classifier that deals with continuous data.

All the given options

GaussianNB

MultinomialNB

BernoulliNB

GaussianNB

New cards

Its target is a categorical variable.

Correlation

Supervised Learning

Regression

Classification

New cards

Regression predicts consecutive numbers. (T or F)

True

New cards

In k-NN, Low Model Complexity is underfitting. (T or F)

True

New cards

In k-NN, Low Model Complexity is overfitting. (T or F)

False

New cards

The ‘offset’ parameter is also called intercept. (T or F)

True

New cards

Types of Linear Models : Linear Regression, ____________.

Logistic Regression

New cards

The ‘slope’ parameter is also called weights or coefficients. (T or F)

True

New cards

Naïve Bayes classifier that deals with integer count data.

MultinomialNB

All the given options

BernoulliNB

GaussianNB

MultinomialNB

New cards

A model which does not capture the underlying relationship in the dataset on which it's trained.

Generalization

Overfitting

Underfitting

New cards

A model is able to make accurate predictions on new, unseen data.

Underfitting

Overfitting

Generalization

New cards

When using multiple nearest neighbors, the prediction is the mean of the relevant neighbors. (T or F)

True

New cards

The ‘offset’ parameter is also called _______.

Intercept

Slope

Weights

Mean

Intercept

New cards

In Ridge regression is α (alpha) is larger, the penalty becomes lesser. (T or F)

False

New cards

Naïve Bayes classifier that deals with integer binary data.

BernoulliNB

GaussianNB

d. All the given options

MultinomialNB

BernoulliNB

New cards

Naïve Bayes learns parameters by looking at each feature individually and collects simple per-class statistics from each feature. (T or F)

True

New cards

Dimensionality reduction techniques help improve model interpretability and performance.

True

False

True

New cards

Unsupervised learning can be used to discover hidden patterns in data.

True

False

True

New cards

In Apriori algorithm, the k data points are randomly selected from the data set as the initial cluster centroid.

True

False

New cards

The training model of this ML consists only of input parameter values and discovers the groups or patterns on its own.

Choices:

Unsupervised

Supervised

Unsupervised

New cards

In Apriori algorithm, we define first the ______.

Choices:

Size of itemset

Confidence

Support

Frequent itemset

Size of itemset

New cards

An Apriori algorithm is a data mining technique for learning correlations and relations among variables in a database.

Choices:

True

False

True

New cards

Apriori algorithm is used in a ________ technique.

Choices:

Classification

Association

Regression

Clustering

Association

New cards

In K-means, the selected k number of clusters will be the initial clusters.

Choices:

True

False

True

New cards

In K-means, each iteration calculates new ______ of datapoints.

Choices:

Median

Cluster

Centroid

Mean

New cards

Support is the probability that if a person buys an item A, then he will also buy an item B.

Choices:

True

False

New cards

The formula for relative support is:

A. Total number of transactions / Total number of transactions containing an itemset X

B. Total number of transactions containing an itemset X / Total number of transactions

C. Total number of itemset / Total number of transactions

D. Total number of transactions / Total number of itemset

New cards

In unsupervised learning, the algorithm divides the data objects into groups according to the similarities and differences between the objects.

Choices:

False

True

New cards

An unsupervised learning that extracts important features from the dataset, reducing the number of irrelevant or random features present.

Choices:

Apriori

All the options

K-Means

Dimensionality Reduction

New cards

In K-means, each datapoint compute distance between the datapoint and the cluster centroid.

Choices:

False

True

New cards

K-means algorithm is used in a ________ technique.

Choices:

Classification

Clustering

Regression

Association

Clustering

New cards

In K-means, each cluster calculates the new median based on the datapoints in the cluster.

Choices:

True

False

New cards

An itemset that meets the support is called a frequent itemset.

Choices:

False

True

New cards

Which algorithm is commonly used for clustering in unsupervised learning?

Choices:

Linear Regression

Decision Tree

Naive Bayes

K-Means

New cards

Which algorithm is commonly used for association rule mining?

Choices:

PCA

K-Means

DBSCAN

Apriori

New cards

Principal Component Analysis (PCA) is primarily used for:

Choices:

Clustering

Classification

Regression

Dimensionality Reduction

New cards

Which of the following is NOT an application of unsupervised learning?

Choices:

Image classification with labeled data

Market basket analysis

Customer segmentation

Fraud detection

Image classification with labeled data

New cards

Which of the following is a key characteristic of unsupervised learning?

Choices:

Supervised feedback

No labeled data

Predefined output classes

Labeled data

No labeled data

New cards

In K-Means clustering, the value of 'K' represents:

Choices:

The number of clusters

The number of iterations

The number of features

The number of data points

The number of clusters

New cards

Unsupervised learning algorithms can be evaluated using accuracy scores.

Choices:

True

False

New cards

K-Means clustering always produces the same result regardless of initial centroid selection.

Choices:

True

False

New cards

Lift measures how much more likely two items are to occur together than if they were independent.

Choices:

False

True

New cards

Association rule mining is used to find relationships between variables in large datasets.

Choices:

False

True

New cards

Which clustering algorithm is density-based and can find clusters of arbitrary shape?

DBSCAN

K-Means

Hierarchical

Apriori

DBSCAN

New cards

DBSCAN is a clustering algorithm that can find clusters of arbitrary shape.

True

False

True

New cards

Which metric is commonly used to evaluate association rules?

Support, Confidence, and Lift

Accuracy

Precision and Recall

Mean Squared Error

Support, Confidence, and Lift

New cards

Hierarchical clustering requires the number of clusters to be specified in advance.

True
False

False

New cards

Rule-based algorithms: Condition while Machine Learning: _________.

Learning

Prediction

Algorithms

Model

100

New cards

ML provides machines the ability to automatically predict from data while identifying patterns to learn with minimal human intervention.

True

False