CAP4612 - Exam 1 Question Bank

5.0(1)

Studied by 6 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/187

Earn XP

Description and Tags

Other Subject

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

188 Terms

New cards

univariate univariable regression

* 1 outcome and 1 explanatory variable
* often used as the introductory

New cards

multivariate multivariable regression

multiple outcomes and multiple explanatory variable

New cards

multivariate univariable regression

multiple outcomes and single explanatory

New cards

multiple regression/multi-variable regression

Suppose you want to perform regression on an:

* independent variable X1, …, XmX1, …, Xm
* dependent variable Y1, …, YnY1, …, Yn

When m>1, it’s called __________________________.

New cards

multi-variate regression

Suppose you want to perform regression on an:

* independent variable X1, …, XmX1, …, Xm
* dependent variable Y1, …, YnY1, …, Yn

When n>1, it’s called __________________________.

New cards

b.) 1 feature to predict the outcome

A univariable regression uses:

a.) 2 features to predict the outcome

b.) 1 feature to predict the outcome

c.) 1 feature in its dataset

d.) At least 2 features in its dataset

New cards

c.) root square mean error (RMSE)

A performance measure for regression is:

a.) recall

b.) precision

c.) root mean square error (RMSE)

d.) F1-score

New cards

(One-variable regression) Consider the plot below corresponding to h∅(x) = ∅0 + ∅1x. What are ∅0 and ∅1?

a.) ∅0 = 0, ∅1 = 1

b.) ∅0 = 0.5, ∅1 = 1

c.) ∅0 = 1, ∅1 = 0.5

d.) ∅0 = 1, ∅1 = 1

New cards

b.) error

Root Mean Square Error (RMSE) is a measure of how much ________ the system typically makes in its predictions.

a.) confidence

b.) error

c.) bias

d.) variance

New cards

c.) outliers

Mean Absolute Error is a preferred performance measure for data with many:

a.) instances

b.) features

c.) outliers

d.) classes

New cards

performance measure

two types of _________ _________:

1\.) utility/fitness function

2\.) cost function

New cards

utility/fitness function

measures how good your model is

New cards

cost function

measures how bad your model is

New cards

normalization/min-max

During the ________________ feature scaling techniques, values are shifted and rescaled so that they end up ranging from 0 to 1.

New cards

standardization

During ____________ feature scaling, we subtract the mean value and then it divides by the standard deviation so that the resulting distribution has unit variance.

New cards

c.) x_i = (age of house - 10)/25

If you are using a learning algorithm to estimate the price of houses in a city, you may want one of your features x_i to capture age of the houses. In your training set, all the houses have an age between 10 to 35 with an average of 17. Which of the following would you use as features if you use normalization for feature scaling:

a.) x_i = age of house

b.) x_i = (age of house)/35

c.) x_i = (age of house - 10)/25

d.) x_i = (age of house - 17)/25

New cards

main challenges

____ ___________ of ML:

1\.) insufficient quantity of training data

2\.) nonrepresentative training data

3\.) poor-quality data

4\.) irrelevant features

5\.) overfitting the training data

6\.) underfitting the training data

New cards

insufficient quantity of training data

* amount of training data is very important
* different ML algorithms perform almost identically well on a complex problem when given enough data (Banko and Brill)

New cards

nonrepresentative training data

* training data should be representative of new cases you want to generalize to
* add missing data to make model more representative of reality → model parameters changed
* sampling noise and sampling bias can occur

New cards

sampling bias

sample is too small

New cards

sampling bias

very large samples can be nonrepresentative is sampling method is flawed

New cards

poor-quality data

* if training data is full of errors, outliers, and noise → it’s harder for the system to detect underlying patterns
* need to clean up training data if instances are clearly outliers or missing a few features

New cards

instances are clearly outliers

solution: discard them or try to fix errors manually

New cards

instances are missing a few features

solutions:

* ignore attribute altogether
* ignore these instances
* fill in missing values (with median age)
* train 1 model with the feature and 1 without

New cards

irrelevant features

* system is only capable of learning if the training data contains enough relevant features and not too many irrelevant ones
* feature engineering

New cards

feature engineering

* process of coming up with a good set of features to train on
* includes:
* feature selection
* feature extraction
* creating new features by fathering new data

New cards

feature selection

selecting the most useful features to train on among existing features

New cards

feature extraction

combining existing features to produce a more useful one

New cards

overfitting the training data

* overfitting: the model performs well on training data but doesn’t generalize well
* solutions:
* simplify the model by selecting one with fewer parameters → reducing number of attributes in training data, constrain the model with regularization
* gather more training data
* reduce noise in the training data → fix data errors and remove outliers

New cards

regularization

* constraining a model to make it simpler and reduce the risk of overfitting
* amount of regularization to apply during learning can be controlled by hyperparameters

New cards

hyperparameter

* parameter of learning algorithm (not model) that can’t be figured out during training process
* configuration that’s external to model and whose value can’t be estimated from data

New cards

hyperparameter properties

* often specified by practitioner by experience
* often set using heuristics and rule of thumb
* often best value is searched by trail and error
* often tuned for a given predictive modeling problem

New cards

underfitting the training data

* underfitting: model is too simple to learn the underlying structure of the data
* solutions:
* select a more powerful model with more parameters (increase complexity)
* feature engineering → feed better features to the learning algorithm
* reduce constraints on the model → reduce regularization hyperparameter

New cards

main steps

_____ _____ in an ML project:

1\.) frame the problem and look at the big picture

2\.) get the data

3\.) discover and visualize the data to gain insights

4\.) prepare the data to better expose the underlying data patterns to ML algorithms

5\.) explore many different models and shortlist the best ones (select a model and train it)

6\.) fine-tune your models and combine them into a great solution

7\.) present your solution

8\.) launch, monitor, and maintain your system

New cards

exploring data

different sub tasks while _________ ______:

1\.) Create copy of the data for exploration (sampling it down to manageable size if necessary)

2\.) Create a Jupyter notebook to keep a record of data exploration

3\.) Study each attribute and its characteristics

4\.) For supervised learning tasks, identify the target attribute

5\.) Visualize the data

6\.) Study the correlations between attributes

7\.) Study how you would solve the problem manually

8\.) Identify the promising transformations you may to apply

9\.) Identify extra data that would be useful

10\.) Document what you have learned

New cards

attribute characteristics

* name
* type (categorical, int/float, bounded/unbounded, text, structured, etc.)
* percent of missing values
* noisiness and type of noise (stochastic, outliers, rounding errors, etc.)
* usefulness for the task
* type of distribution (Gaussian, uniform, logarithmic, etc)

New cards

data preparation

sub tasks of _____ __________ step:

1\.) data cleaning

2\.) feature selection (optional)

3\.) feature engineering (when appropriate)

4\.) feature scaling

New cards

data cleaning

* fix or remove outliers (optional)
* fill in the missing values (with zero, mean, median) or drop their rows/columns

New cards

feature selection

drop the attributes that provide no useful info for the task

New cards

feature engineering

* discretize continuous feature
* decompose features (ex: categorical, date/time,etc)
* add promising transformations of features (ex: log(x), sqrt(x), x2, etc)
* aggregate features into promising new features

New cards

feature scaling

standardize or normalize features

New cards

promising models

sub tasks of shortlisting __________ _______:

1\.) Train many quick-and-dirty models from different categories (linear, naive Bayes, SVM, random forest, neural net) using standard parameters

2\.) Measure and compare their performance → for each model use N-fold cross validation, compute the mean and standard deviation of performance measure on the N folds

3\.) Analyze the most significant variables for each algorithm

4\.) Analyze the types of errors the models make

5\.) Perform a quick round of feature selection and engineering

6\.) Perform 1 or 2 more quick iterations of last 5 steps

7\.) Shortlist the top 3-5 most promising models, preferring models that make diff types of errors

New cards

data snooping bias

To avoid ______ _____________ _____, we should not look at the test set. If we look, we may see an interesting pattern in the test data that leads you to select a particular kind of ML model. Since your model will perform well on the test set because of this selection, you might get an unexpected generalization error.

New cards

generalization

To avoid data snooping bias , we should not look at the test set. If we look, we may see an interesting pattern in the test data that leads you to select a particular kind of ML model. Since your model will perform well on the test set because of this selection, you might get an unexpected ____________ error.

New cards

stratified sampling

We perform ________________ to guarantee that the test set is representative of the overall population. During this sampling, the population is divided into homogeneous subgroups called strata, and the right number of instances are sampled from each stratum.

New cards

strata

We perform stratified sampling to guarantee that the test set is representative of the overall population. During this sampling, the population is divided into homogeneous subgroups called ______, and the right number of instances are sampled from each stratum.

New cards

stratum

We perform stratified sampling to guarantee that the test set is representative of the overall population. During this sampling, the population is divided into homogeneous subgroups called strata, and the right number of instances are sampled from each ______.

New cards

d.) All of the above

When we draw a scatter plot between house value and income, we notice the correlation between the two attributes as seen in the figure. Which of the followings can be learned about the data from the figure below?

a.) The correlation is indeed strong

b.) The price cap that we noticed earlier is clearly visible as a horizontal line at $500,000.

c.) The plot reveals other less obvious straight lines: a horizontal line around $450,000, another around $350,000

d.) All of the above

New cards

none

Which of these are not one of the feature engineering steps?

a.) Discretize continuous features

b.) Decompose features (ex: categorical, date/time, etc.)

c.) Add promising transformations of features (ex: log(x), sqrt(x), x2, etc)

d.) Aggregate features into promising new features

New cards

toy data set

* provided in scikit-learn library
* small data set used to quickly illustrate the behavior of the various algorithms implemented in scikit-learn

New cards

load_data_set_name

a method used to load a toy data set in scikit-learn

New cards

false positives

If you are creating a classifier to filter bad videos for kids (4-6 years) and your classifier predicts the bad videos. Would you willing to tolerate high number of false positive or high false negatives?

New cards

multiclass/multinomial classifier

classifier that distinguishes between more than 2 classes

New cards

multiclass classification

Strategies to perform __________________ using binary classifiers:

1\.) One-versus-the-Rest (OvR)

2\.) One-versus-One (OvO)

New cards

one-versus-the-rest (OVR)

* AKA one-versus-all
* get the decision score from each classifier for that image and select the class whose classifier outputs the highest score

New cards

one-versus-one (OVO)

* train a binary classifier for every pair of binary classifier → one to distinguish 0s and 1s, another to distinguish 0s to 2s, another for 1s and 2s, etc
* If there are N classes, you need to train N \* (N-1)/2 classifiers

New cards

We would like to use binary classifiers to detect a letter from the alphabet. We use OvR strategy, how many binary classes do we need to train?

New cards

(26\*25)/2

We would like to use binary classifiers to detect a letter from the alphabet. We use OvO strategy, how many binary classes do we need to train?

New cards

You have a multi-class classification problem with k classes, using one-vs-rest method, how many different logistic regression classifiers will you end up training?

New cards

a, b, c

Select the multiclass classification:

a.) Assigning a tag to an email from one of the following: Promotion, Social, Primary

b.) Assigning a patent one of these: not ill, cold, flu

c.) Assigning the weather as one of these: sunny, rain, snow, cloudy

d.) Analyzing a picture and assigning both young/old and male/female options

New cards

multiple binary tags

A classification system that outputs __________ __________ _____ is called a multilabel classification system.

New cards

multioutput-multiclass classification

* AKA multioutput classification
* a generalization of multilabel classification where each label can be multiclass (can have more than 2 possible values)

New cards

classifications

types of _____________:

1\.) binary

2\.) multiclass

3\.) multilabel

4\.) multioutput

New cards

binary classification

ex: digit is 5 or not

New cards

multiclass classification

* AKA multinomial
* ex: digit is 0, 1, …, 9

New cards

multilabel classification

* AKA multiple binary
* ex: odd or not, greater than 5 or not

New cards

multioutput classification

* AKA multiple multiclass
* ex: 28x28 labels for each image and each label value is 1 to 256

New cards

False

For the below error matrix, columns show the predictions and rows show the actual values.

(T/F) The column for class 8 is quite bright, which tells you that many images get correctly classified as 8s.

New cards

true

For the below error matrix, columns show the predictions and rows show the actual values.

(T/F) The row for class 8 is not that bad, telling you that actual 8s in general get properly classified as 8s.

New cards

d.) all of the above

What can we do to fix a large column of errors in an error matrix?

a.) more training data for digits that look like 8s

b.) engineer more features to help the classifier

c.) process images to make some patterns stand out more

d.) all of the above

New cards

a.) Find the parameters that minimize the cost function

Gradient decent is used for the following purpose:

a.) Find the parameters that minimize the cost function

b.) Evaluate how good the predictions are.

c.) Split the dataset in training and test sets.

d.) Compute the recall

New cards

gradient descent

types of ________ _______ techniques:

1\.) batch

2\.) stochastic

3\.) mini-batch

New cards

data used

batch vs stochastic vs mini-batch in terms of ______ ______:

* batch uses the whole data set
* mini-batch uses only a subset of the whole data set
* stochastic uses 1 random data/example

New cards

speed

batch vs stochastic vs mini-batch in terms of ______:

* stochastic (fastest) > mini-batch > batch (slowest)

New cards

global minimum

batch vs stochastic vs mini-batch in terms of ______ _________:

* batch reaches the global minimum and then stops
* stochastic and mini-batch walk around the minimum

New cards

convex

To be able to find the global minimum of a cost function, the cost function should be a _______ function when we use it with gradient descent technique.

New cards

a.) Learning rate is too high

Which of the following is true for the following figure?

a.) Learning rate is too high

b.) Learning rate is too small

c.) Cost function is not convex

d.) Gradient descent cannot find global minimum

New cards

learning schedule

* a function that determines the learning rate at each iteration
* if it’s reduced too quickly, you may get stuck in a local minimum

New cards

mini-batches

At each step, instead of computing the gradients based on the full training set (as in Batch GD) or based on just one instance (as in Stochastic GD), Mini-batch GD computes the gradients on small random sets of instances called *____________.*

New cards

d.) All of the above

Which of the following is true for Normal Equation, Batch Gradient Descent (GD), Stochastic GD and Mini-Batch GD?

a.) After training, all these algorithms end up with very similar models and make predictions in exactly the same way.

b.) Batch GD’s path actually stops at the minimum, while both Stochastic GD and Mini-batch GD continue to walk around global minimum.

c.) Mini-batch GD will end up walking around a bit closer to the minimum than Stochastic GD - but it may be harder for it to escape from local minimal

d.) All of the above

New cards

b.) Stochastic Gradient Descent

Given a training set with millions of features, the fastest algorithm to use to perform a search for a global minimum is:

a.) The Normal Equation

b.) Stochastic Gradient Descent

c.) Mini-batch Gradient Descent

d.) Batch Gradient Descent

New cards

a.) Convex

Gradient Descent will converge when training a Logistic Regression model because the cost function is:

a.) Convex

b.) Complex

c.) Collocated

d.) Core optimized

New cards

a.) local minimum

Gradient Descent cannot get stuck in a _______________ when training a Logistic Regression model

a.) local minimum

b.) global minimum

c.) plateau

d.) summit

New cards

c.) a-left is underfitting, b-center ideal separation, c-right is overfitting

Given the following set graphs (a –left, b-center, c-right), what is the statement that better describe the undermining / overfitting situation?

a.) a-left overfit the training set, b-center is an ideal separation, c-right underfit the training set.

b.) a-left is an ideal separation, b-center is underfitting, c-right is overfitting

c.) a-left is underfitting, b-center ideal separation, c-right is overfitting

d.) a-left is an ideal separation, b-center is overfitting, and c-right is underfitting.

New cards

a, b, c, d

Which of the followings are an indication of underfitting?

a.) The model performs poor on the training data and also generalizes poorly.

b.) The training and validation learning curves reach a plateau and they are close and fairly high.

c.) Adding more training data does not help improving the performance on the training data

d.) We need more complex model or come up with better features

New cards

a, b, c

Which of the followings are an indication of overfitting?

a.) The model performs well on the training data but generalizes poorly according to the cross-validation metrics.

b.) The error on the training data is low but considerable high on the validation data

c.) There is a gap between the learning curves for training and validation data

New cards

normal equation

* used to find the value of θ that minimizes the cost function
* closed-form solution (mathematical equation)

New cards

variance

A high-degree polynomial model is likely have high ________, and thus to overfit the training data while a high bias model is most likely to underfit the training data.

New cards

overfit

A high-degree polynomial model likely have high variance, and thus to ______ the training data while a high bias model is most likely to underfit the training data.

New cards

bias

A high-degree polynomial model likely have high variance, and thus to overfit the training data while a high ___ model is most likely to underfit the training data.

New cards

underfit

A high-degree polynomial model likely have high variance, and thus to overfit the training data while a high bias model is most likely to ______ the training data.

New cards

true

(T/F) Machine learning systems improve their performance in a given task with more and more experience or data.

New cards

b.) A dataset that contains the desired solution

A labeled training set is:

a.) A dataset that contains specific names

b.) A dataset that contains the desired solution

c.) A dataset that contains Boolean instances

d.) A dataset that contains sufficient instances

New cards

a.) Unsupervised learning task

Clustering is a type of:

a.) Unsupervised learning task

b.) Supervised learning task

c.) Regression learning task

d.) Batch learning task

New cards

a.) Reinforcement learning

The best learning type to teach a robot to learn to walk in various unknown terrains is:

a.) Reinforcement learning

b.) Supervised learning

c.) Semi-supervised learning

d.) Other types of learning

New cards

b.) Clustering task

Customer segmentation into group is a type of:

a.) Classification task

b.) Clustering task

c.) Regression task

d.) Reinforcement task

New cards

a.) Online learning

This type of learning method is capable of adapting rapidly to both changing data and autonomous systems, and of training on very large quantities of data.

a.) Online learning

b.) Offline learning

c.) Reinforcement learning

d.) Semi-supervised learning

New cards

true

(T/F) An online learning system can learn incrementally

New cards

false

(T/F) Batch learning systems learn dynamically

100

New cards

true

(T/F) A hyperparameter is a parameter of the learning algorithm itself, not of the model