Machine Learning Compilation

0.0(0)
studied byStudied by 24 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/160

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

161 Terms

1
New cards

ML provides machines the ability to automatically learn from data while identifying patterns to make ______ with minimal human intervention. 

predictions

2
New cards

Types of Machine Learning : Supervised, Unsupervised, Semi-Supervised, and __________. 

reinforcement

3
New cards

Letters, Symbols, Words, Gender are samples of:

Ordinal data

Nominal data

Discrete data

Continuous data

Nominal data

4
New cards

The act of filling in missing values by estimation.

Imputation

Mean Imputation

Most-frequent Imputation

Column Transformation

Imputation

5
New cards

The ______ is an observation that goes far outside the average value of a group of statistics.

outlier

6
New cards

Based on the ML application table scenario, when rule gets complex and problem scale is small, ML application is:

Rule-base Algorithms

Simple Problem

ML Algorithms

Manual Rules

Manual Rules

7
New cards

Based on the ML application table scenario, when rule complexity is simple and problem scale is large, ML application is:

Rule-based Algorithms

Manual Rules

ML Algorithms

Simple Prolem

Rule-based Algorithms

8
New cards

Which data preprocessing task is the most time consuming?

Data modeling

Data analysis

Data collection

Data cleaning

Data cleaning

9
New cards

A continuous data is:

Quantitative

10
New cards

Machine Learning is a field of study concerned with giving computers the ability to ________ without being explicitly programmed. 

learn

11
New cards

Which is not true about Machine Learning?

Enable computers to operate autonomously with explicit programming.

Machines driven by algorithms designed by humans are able to learn latent rules and inherent patterns and to fulfill tasks desired by humans.

Their maintenance is much lower than a human's and costs a lot less in the long run.

Automation by machine learning can mitigate risks caused by fatigue or inattention.

Enable computers to operate autonomously with explicit programming.

12
New cards

Which processes are involved in data preparation?

Data collection, Data Cleaning

Not in the options

Data Cleaning, Feature Engineering

All the given options

Splitting of dataset

All the given options

13
New cards

Rule-based algorithms: Condition 

Machine Learning: _________. 

model

14
New cards

Removing duplicates in the dataset is a feature engineering technique. (T or F)

False

15
New cards

An ordinal data is:

Qualitative

16
New cards

Data reduction is a feature engineering technique. (T or F)

True

17
New cards

Choose all the most popular Python Libraries that are used in data science.

PANDAS

NUMPY

JUPYTER

SCIPY

SQL

ANACONDA

PANDAS

NUMPY

SCIPY

18
New cards

Sorting out missing data is a data cleansing technique. (T or F)

True

19
New cards

Dataset is divided into _______ set and test set. 

train

20
New cards

Based on the ML application table scenario, when rule complexity is complex and problem scale is large, ML application is:

Simple Problem

Rule-based Algorithms

ML Algorithms

Manual Rules

ML Algorithms

21
New cards

What are the two main phases of ML workflow?

Training, Modeling

Training, Testing

Learning, Prediction

Learning, Modeling

Training, Prediction

Learning, Prediction

22
New cards

In EDA, this process identifies unusual data points. _________

outlier detection

23
New cards

Reducing noise in data is a feature engineering technique. (T or F)

False

24
New cards

Temperature range is a sample of:

Continuous Data

25
New cards

Data reduction is a data cleansing technique. (T or F)

False

26
New cards

The ______ analysis examines relationship between variables.

correlation

27
New cards

Which is true about ML, AI, and DS?

DS and AI are subsets of ML

All the given options

DS and ML are subsets of AI

AI and ML are subsets of DS

AI and ML are subsets of DS

28
New cards

Machine Learning is a field of study concerned with giving computers the ability to ________ without being explicitly programmed. 

learn

29
New cards

This dataset is used in the process of using the model obtained after learning for prediction.

Test Set

Training and Test Set

Model Set

Training set

Test Set

30
New cards

ML is a research field at the intersection of _________, artificial intelligence, and computer science. 

statistics

31
New cards

A nominal data is:

Qualitative

32
New cards

Movie ratings, Military rank are samples of:

Continuous data

Discrete data

Ordinal data

Nominal data

Ordinal Data (Wrong Canvas)

33
New cards

The larger variety of data points your data set contains, the more complex a model you can use without overfitting. (T or F)

True

34
New cards

Binary Classification is a classification of dichotomous classes. (T or F)

True

35
New cards

The _____ allows the models to make informed predictions even when faced with previously unseen data. 

Underfitting

Generalization

Overfitting

Generalization

36
New cards

Supervised algorithms address classification problems where the output variable is categorical. (T or F)

False

37
New cards

The ______ refers to the error resulting from sensitivity to the noise in the training data.

variance

38
New cards

The more complex we allow our model to be, the better we will be able to predict on the training data. (T or F)

True

39
New cards

SVM is an example of regression algorithm. (T or F)

False

40
New cards

In k-NN, High Model Complexity is overfitting. (T or F)

True

41
New cards

In k-NN, when you choose a small value of k (e.g., k=1), the model becomes less complex. (T or F)

False

42
New cards

The ‘k’ in k-Nearest neighbors refers to an arbitrary number of neighbors. (T or F)

True

43
New cards

In k-NN, voting means for each test point, we count how many neighbors belong to a class e.g. how many belong to class 0 and how many neighbors belong to class 1.  (T or F)

True

44
New cards

In the estimation of regression model, predicting worse than the average can result in negative numbers. (T or F)

True

45
New cards

When comparing training set and test set scores, we find that we predict very accurately on the training set, but the R2 on the test set is much worse. This is a sign of overfitting. (T or F)

True

46
New cards

Lasso uses L2 Regularization. (T or F)

False

47
New cards

What is the full form of OLS?

Ordinary Least Squares

48
New cards

Regularization means explicitly restricting a model to avoid overfitting. (T or F)

True

49
New cards

Ridge is generally preferred over Lasso, but if you want a model that is easy to analyze and understand then use Ridge. (T or F)

False

50
New cards

In Ridge regression is α (alpha) is larger, the penalty becomes larger. (T or F)

True

51
New cards

Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. (T or F)

True

52
New cards

Naïve Bayes classifier that deals with continuous data.

All the given options

GaussianNB

MultinomialNB

BernoulliNB

GaussianNB

53
New cards

Its target is a categorical variable.

Correlation

Supervised Learning

Regression

Classification

Classification

54
New cards

Regression predicts consecutive numbers. (T or F)

True

55
New cards

In k-NN, Low Model Complexity is underfitting. (T or F)

True

56
New cards

In k-NN, Low Model Complexity is overfitting. (T or F)

False

57
New cards

The ‘offset’ parameter is also called intercept. (T or F)

True

58
New cards

Types of Linear Models : Linear Regression, ____________. 

Logistic Regression

59
New cards

The ‘slope’ parameter is also called weights or coefficients. (T or F)

True

60
New cards

Naïve Bayes classifier that deals with integer count data.

MultinomialNB

All the given options

BernoulliNB

GaussianNB

MultinomialNB

61
New cards

A model which does not capture the underlying relationship in the dataset on which it's trained.

Generalization

Overfitting

Underfitting

Underfitting

62
New cards

A model is able to make accurate predictions on new, unseen data.

Underfitting

Overfitting

Generalization

Generalization

63
New cards

When using multiple nearest neighbors, the prediction is the mean of the relevant neighbors. (T or F)

True

64
New cards

The ‘offset’ parameter is also called _______.

Intercept

Slope

Weights

Mean

Intercept

65
New cards

In Ridge regression is α (alpha) is larger, the penalty becomes lesser. (T or F)

False

66
New cards

Naïve Bayes classifier that deals with integer binary data.

BernoulliNB

GaussianNB

d. All the given options

MultinomialNB

BernoulliNB

67
New cards

Naïve Bayes learns parameters by looking at each feature individually and collects simple per-class statistics from each feature. (T or F)

True

68
New cards

Dimensionality reduction techniques help improve model interpretability and performance.

True

False

True

69
New cards

Unsupervised learning can be used to discover hidden patterns in data.

True

False

True

70
New cards

In Apriori algorithm, the k data points are randomly selected from the data set as the initial cluster centroid.

True

False

False

71
New cards

The training model of this ML consists only of input parameter values and discovers the groups or patterns on its own.

Choices:

Unsupervised

Supervised

Unsupervised

72
New cards

In Apriori algorithm, we define first the ______.

Choices:

Size of itemset

Confidence

Support

Frequent itemset

Size of itemset

73
New cards

An Apriori algorithm is a data mining technique for learning correlations and relations among variables in a database.

Choices:

True

False

True

74
New cards

Apriori algorithm is used in a ________ technique.

Choices:

Classification

Association

Regression

Clustering

Association

75
New cards

In K-means, the selected k number of clusters will be the initial clusters. 

Choices:

True

False

True

76
New cards

In K-means, each iteration calculates new ______ of datapoints.

Choices:

Median

Cluster

Centroid

Mean

Mean

77
New cards

Support is the probability that if a person buys an item A, then he will also buy an item B.

Choices:

True

False

False

78
New cards

The formula for relative support is:

A. Total number of transactions / Total number of transactions containing an itemset X

B. Total number of transactions containing an itemset X / Total number of transactions

C. Total number of itemset / Total number of transactions

D. Total number of transactions / Total number of itemset

B

79
New cards

In unsupervised learning, the algorithm divides the data objects into groups according to the similarities and differences between the objects.

Choices:

False

True

True

80
New cards

An unsupervised learning that extracts important features from the dataset, reducing the number of irrelevant or random features present.

Choices:

Apriori

All the options

K-Means

Dimensionality Reduction

Dimensionality Reduction

81
New cards

In K-means, each datapoint compute distance between the datapoint and the cluster centroid.

Choices:

False

True

True

82
New cards

K-means algorithm is used in a ________ technique.

Choices:

Classification

Clustering

Regression

Association

Clustering

83
New cards

In K-means, each cluster calculates the new median based on the datapoints in the cluster. 

Choices:

True

False

False

84
New cards

An itemset that meets the support is called a frequent itemset.

Choices:

False

True

True

85
New cards

Which algorithm is commonly used for clustering in unsupervised learning?

Choices:

Linear Regression

Decision Tree

Naive Bayes

K-Means

K-Means

86
New cards

Which algorithm is commonly used for association rule mining?

Choices:

PCA

K-Means

DBSCAN

Apriori

Apriori

87
New cards

Principal Component Analysis (PCA) is primarily used for:

Choices:

Clustering

Classification

Regression

Dimensionality Reduction

Dimensionality Reduction

88
New cards

Which of the following is NOT an application of unsupervised learning?

Choices:

Image classification with labeled data

Market basket analysis

Customer segmentation

Fraud detection

Image classification with labeled data

89
New cards

Which of the following is a key characteristic of unsupervised learning?

Choices:

Supervised feedback

No labeled data

Predefined output classes

Labeled data

No labeled data

90
New cards

In K-Means clustering, the value of 'K' represents:

Choices:

The number of clusters

The number of iterations

The number of features

The number of data points

The number of clusters

91
New cards

Unsupervised learning algorithms can be evaluated using accuracy scores.

Choices:

True

False

False

92
New cards

K-Means clustering always produces the same result regardless of initial centroid selection.

Choices:

True

False

False

93
New cards

Lift measures how much more likely two items are to occur together than if they were independent.

Choices:

False

True

True

94
New cards

Association rule mining is used to find relationships between variables in large datasets.

Choices:

False

True

True

95
New cards

Which clustering algorithm is density-based and can find clusters of arbitrary shape?

DBSCAN

K-Means

Hierarchical

Apriori

DBSCAN

96
New cards

DBSCAN is a clustering algorithm that can find clusters of arbitrary shape.

True

False

True

97
New cards

Which metric is commonly used to evaluate association rules?

Support, Confidence, and Lift

Accuracy

Precision and Recall

Mean Squared Error

Support, Confidence, and Lift

98
New cards

Hierarchical clustering requires the number of clusters to be specified in advance.

True
False

False

99
New cards

Rule-based algorithms: Condition while Machine Learning: _________.

Learning

Prediction

Algorithms

Model

Model

100
New cards

ML provides machines the ability to automatically predict from data while identifying patterns to learn with minimal human intervention.

True

False

False