Thẻ ghi nhớ: AIL303m | Quizlet

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/216

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

217 Terms

New cards

Assume you have a data set that summarizes a marketing campaign with information related to prospective customers. The data set contains 100 observations with several columns that summarize information about the prospective customer. It also has a column that flags whether the prospect responded or not.

A machine learning model that predicts response, is using the column Responded as:

A. sample

B. features

C. target

D. example

New cards

What is the goal of machine learning?

A. The goal of machine learning is to achieve high accuracy in prediction for future examples, not for the training ones

B. The goal of machine learning is to achieve high accuracy in prediction for future examples, and for the training ones

C. The goal of machine learning is to achieve low accuracy in prediction for future examples, high accuracy for the training phase

D. The goal of machine learning is to achieve low accuracy in prediction for future examples, and low accuracy for the training phase

New cards

What are the first two steps of a typical machine learning workflow?

A. Problem statement and data cleaning.

B. Problem statement and data collection.

C. Data collection and data transformation.

D. None of the others

New cards

The data below appears in 'data.txt', and Pandas has been imported. Which Python command will read it correctly into a Pandas DataFrame?

63.03 22.55 39.61 40.48 98.67 -0.25 AB

39.06 10.06 25.02 29 114.41

4.56 AB

68.83 22.22 50.09 46.61 105.99-3.53 AB

A. pandas.read('data.txt')

B. pandas.read_csv('data.txt', header=None, sep=' ')

C. pandas.read_csv('data.txt', delim_whitespace=True)

D. pandas.read_csv('data.txt', header=0, delim_whitespace=True)

New cards

(Choose 2 answers)

There are several SQL databases available such as

A. MySQL

B. PostgreSQL

C. MongoDB

D. Cassandra

A,B

New cards

Which of the following is an example of a file type that uses Javascript Object Notation JSON) formatting?

A. Python (py files)

B. Javascript (js files)

C. SQL Database (sql files)

D. Jupyter/iPython (ipynb files)

New cards

(Choose 2 answers)

Why is EDA Useful?

A. EDA allows us to get an initial feel for the data.

B. It lets us determine if the data makes sense, or if further cleaning or more data is needed.

C. EDA does not help to identify patterns and trends in the data

D. None of these

A,B

New cards

What are polynomial features?

A. They are higher order relationships in the data.

B. They are logistic regression coefficients.

C. They are lower order relationships in the data.

D. They are represented by linear relationships in the data.

New cards

Which variable transformation should you use for ordinal data?

A. Standard scaling

B. One-hot encoding

C. Ordinal encoding

D. Min-max scaling

New cards

What is p-value in statistical hypothesis testing?

A. The probability of making a Type I error

B. The probability of rejecting a true null hypothesis

C. The probability of accepting a false null hypothesis

D. The probability of observing the data given that the null hypothesis is true

New cards

The most common way of estimating parameters in a parametric model is:

A. using the maximum likelihood estimation

B. using the central limit theorem

C. extrapolating a non-parametric model

D. extrapolating Bayesian statistics

New cards

What is the purpose of point estimation in statistics?

A. To estimate a range of values for a parameter

B. To summarize the central tendency of a dataset

C. To provide a single value as the best guess for a parameter

D. To assess the variability of a dataset

New cards

What role does feature engineering play in the machine learning process?

A. It involves optimizing model hyperparameters

B. It refers to the creation of complex algorithms for model training.

C. It focuses on selecting and transforming input variables to enhance model performance

D. It is irrelevant in the context of machine learning.

New cards

The autocorrect on your phone is an example of:

A. Unsupervised learning

B. Supervised learning

C. Semi-supervised learning

D. Reinforcement learning

New cards

What is the purpose of splitting a dataset into training and test sets in machine learning?

A. To increase the size of the training set for better model performance

B. To reduce overfitting by evaluating the model on unseen data

C. To improve the computational efficiency of model training

D. To introduce randomness and diversity in the training process

New cards

If a low-complexity model is underfitting during estimation, which of the following is MOST LIKELY true (holding the model constant)?

A. K-fold cross-validation will still lead to underfitting, for any k.

B. K-cross-validation with a small k will reduce or eliminate underfitting.

C. K-fold cross-validation with a large k will reduce or eliminate underfitting.

D. None of the above.

New cards

In the context of linear regression, what is the purpose of regularization techniques such as Lasso and Ridge?

A. To increase the complexity of the model

B. To add noise to the dataset

C. To reduce overfitting and stabilize coefficient estimates

D. To introduce non-linearity in the regression relationship

New cards

What is the main disadvantage of using a high-degree polynomial in Polynomial Regression?

A. It is computationally expensive

B. It may lead to overfitting

C. It always results in underfitting

D. It converges slowly

New cards

What is the most common sklearn methods to add polynomial features to your data?

A. polyFeat.add and polyFeat.transform

B. polyFeat.add and polyFeat.fit

C. polyFeat.fit and polyFeat.transform

D. polyFeat.transform

New cards

Which problem in machine learning does regularization primarily address?

A. Overfitting

B. Underfitting

C. Both overfitting and underfitting

D. Feature engineering

New cards

What is the role of the regularization parameter in the regularization process?

A. It controls the overall strength of regularization

B. It sets the learning rate for the optimization algorithm

C. It defines the number of iterations in model training

D. It has no impact on regularization

New cards

Which of the following statements is FALSE?

A. In Analytic View, increasing L2/L1 penalties force coefficients to be smaller, restricting their plausible range.

B. Under the Geometric formulation, the cost function minimum is found at the intersection of the penalty boundary and a contour of the traditional OLS cost function surface.

C. Under the Probabilistic formulation, L2 (Ridge) regularization imposes Gaussian prior on the coefficients,while L1 (Lasso) regularization imposes Laplacian prior.

D. None of the others

New cards

Compared with Lasso regression (assuming similar implementation), Ridge regression is:

A. less likely to overfit to training data.

B. more likely to overfit to training data.

C. less likely to set feature coefficients to zero.

D. more likely to set feature coefficients to zero.

New cards

How does increasing the regularization strength (lambda) in Lasso regularization affect the sparsity of the solution?

A. Higher lambda encourages sparser solutions

B. Lower lambda encourages sparser solutions

C. Lambda has no effect on sparsity

D. Lambda encourages equalization of coefficients

New cards

Which statement about Logistic Regression is TRUE?

A. Logistic Regression is a generalized linear model.

B. Logistic Regression models can only predict variables with 2 classes.

C. Logistic Regression models can be used for classification but not for regression.

D. Logistic Regression models cannot be used for regression and classification.

New cards

What does the term "precision" measure in the context of classification metrics?

A. The ability to correctly identify positive instances

B. The ability to correctly identify negative instances

C. The proportion of actual positive instances correctly identified

D. The trade-off between precision and recall

New cards

Usually the first step to fit a k nearest neighbor classifier using scikit learn is to:

. import KNN from the sklearn.knearest neighbors module

A e.g. from sklearn.knearestneighbors import KNN

B. import Kneighbors Classifier from the sklearn.neighbors module

e.g. from sklearn.neighbors import KNeighbors Classifier

C. import Classifier from the sklearn.nearestneighbors module

e.g. from sklearn.nearest neighbors import Classifier

D. import KNNClassifier from the sklearn.knearestneighbors module

e.g. from sklearn.knearest neighbors import KNNClassifier

New cards

In SVM, what is the purpose of support vectors?

A. They are used to define the margin between classes.

B. They are outliers that are ignored during training.

C. They are the data points that lie closest to the decision boundary.

D. They are the weights assigned to features in the input data.

New cards

Which statement about Support Vector Machines is TRUE?

A. Support Vector Machine models are non-linear.

B. Support Vector Machine models rarely overfit on training data if using regularization.

C. Support Vector Machine models can be used for classification but not for regression.

D. Support Vector Machine models can be used for regression but not for classification.

New cards

Decision Tree algorithm can use for

A. Data Cleaning

B. Feature Engineering

C. Classification and Regression

D. Hyperparameter Tuning

New cards

This is the best way to choose the number of trees to build on a Bagging ensemble.

A. Choose a large number of trees, typically above 100

B. Prioratize training error metrics over out of bag sample

C. Tune number of trees as a hyperparameter that needs to be optimized

D. Choose a number of trees past the point of diminishing returns

New cards

A. Improved model interpretability

What is the impact of increasing the number of trees in a Random Forest on its performance?

B. Increased risk of overfitting

C. Decreased computational efficiency

D. Enhanced model generalization

New cards

What does synthetic oversampling technique in handling imbalanced datasets do?

A. It involves creating entirely new features for the minority class

B. It generates synthetic samples for the minority class to balance the dataset

C. It randomly removes samples from both classes

D. It focuses on increasing the size of the majority class

New cards

Which of the following statements is true about downsampling in imbalanced datasets?

A. Downsampling always leads to better model performance

B. Downsampling involves removing samples from the majority class

C. Downsampling is primarily used to increase the size of the dataset

D. Downsampling has no impact on the distribution of classes

New cards

Which of the following is a common metric used to evaluate models in the presence of imbalanced classes?

A. Accuracy

B. Mean Squared Error

C. R-squared

D. F1 Score

New cards

Which of the following statements is true regarding Random and Synthetic Over Sampling in the context of

imbalanced datasets?

A. Random Over Sampling duplicates random instances of the minority class to balance the dataset, while Synthetic Over Sampling generates new synthetic instances.

B. Random Over Sampling generates new synthetic instances of the minority class, while Synthetic Over Sampling duplicates random instances.

C. Random Over Sampling and Synthetic Over Sampling both involve duplicating random instances of the minority class.

D. Random Over Sampling and Synthetic Over Sampling are identical techniques with different names.

New cards

The stratified sampling technique in machine learning is used?

A. To create a balanced dataset

B. To reduce the number of features

C. To ensure that each class is represented proportionally in training and testing sets

D. To increase the size of the dataset

New cards

What is a potential challenge associated with the K-means algorithm?

A. It is sensitive to the initial choice of centroids and can converge to local optima

B. It always produces the same clusters regardless of the dataset

C. It performs poorly on high-dimensional data

D. It is only suitable for small datasets

New cards

The purpose of the K-means algorithm is

A. To find the optimal number of clusters

B. To minimize the within-cluster variance

C. To maximize the between-cluster variance

D. To assign cluster labels based on proximity

New cards

Which statement is a common use of Dimension Reduction in the real world?

A. Image tracking

B. Explaining the relation between the amount of alcohol consumption and diabetes.

C. Deep Learning

D. Predicting whether a customer will return to a store to make a major purchase.

New cards

Which of these options is NOT an example of Unsupervised Learning?

A. Segmenting costumers into different groups.

B. Reducing the size of a data set without losing too much information from our original data set.

C. Explaining the relationship between an individual's income and the price they pay for a car.

D. Grouping observations together to find similar patterns across them.

New cards

What happen with our second cluster centroid when we use the probability formula?

A. When we use the probability formula, we put less weight on the points that are far away. So, our second cluster centroid is likely going to be closer.

B. When we use the probability formula, we put more weight on the points that are far away. So, our second cluster centroid is likely going to be more distant.

C. When we use the probability formula, we put more weight on the lighter centroids, because it will take more computational power to draw our clusters. So, the second cluster centroid is likely going to be less distant.

D. When we use the probability formula, we put less weight on the points that are far away. So, our second cluster centroid is likely going to be more distant.

New cards

When applying K-means clustering, what is the role of the hyperparameter K, and how does it impact the results?

A. K controls the learning rate and convergence speed

B. K determines the number of clusters, influencing the final grouping of data points

C. K represents the dimensionality of the feature space

D. K is unrelated to clustering

New cards

it refers to the number of iterations

New cards

What is one of the real-world solutions to fix the problems of the curse dimensionality?

A. Increase the size of the data set

B. Use more computational power

C. Reduce the dimension of the data set.

D. Balance the classes of a data set

New cards

hat is the role of bandwidth in mean-shift clustering?

A. Controls convergence speed

B. Determines cluster size

C. Defines neighborhood size

D. Sets the number of clusters

New cards

The difference between Euclidean distance and Manhattan distance is

A. Euclidean distance considers only horizontal or vertical movements, while Manhattan distance allows diagonal movements.

B. Euclidean distance is the sum of absolute differences, while Manhattan distance is the sum of squared differences.

C. Euclidean distance measures the straight-line distance, while Manhattan distance measures the sum of absolute differences along each dimension

D. Euclidean distance is only applicable to numerical data, while Manhattan distance is used for categorical data

New cards

What is the major purpose of distance metrics in clustering?

A. To measure the physical distance between data points

B. To quantify the similarity or dissimilarity between data points

C. To determine the direction of data points

D. To classify data points into predefined clusters

New cards

In PCA, how does the explained variance ratio of each principal component help in determining the optimal number of components?

A. By selecting components with the highest explained variance ratio

B. By selecting components with the lowest explained variance ratio

C. By ignoring the explained variance ratio and always using all components

D. By using the cumulative explained variance ratio to find an elbow point in the scree plot

New cards

What are the eigenvalues in PCA?

A. Eigenvalues represent the mean values of each feature

B. Eigenvalues indicate the importance of each principal component in capturing variance

C. Eigenvalues are the coefficients of the linear regression model in PCA

D. Eigenvalues are not relevant in PCA

New cards

In PCA, what are principal components?

A. Data points in the dataset

B. The features with the highest importance

C. The eigenvectors of the covariance matrix

D. The mean values of each feature

New cards

You are evaluating a binary classifier. There are 80 positive outcomes in the test data, and 150 observations.Using a 50% threshold, the classifier predicts 60 positive outcomes, of which 20 are incorrect.

What is the classifier's F1 score on the test sample?

Α. 75%

B. 57.2%

C. 67.5%

D. 42.8%

New cards

The following statement is an example of a business case where we can use the Cosine Distance?

A. Cosine is useful for coordinate based measurements.

B. Cosine is better for data such as text where location of occurrence is less important.

C. Cosine distance is more sensitive to the curse of dimensionality

D. Cosine distance is less sensitive to the curse of dimensionality

New cards

A. Anomaly detection.

Which of the following examples is NOT a common use case of clustering in the real world?

B. Customer segmentation

C. Determine risk factor and prevention factors for diseases such as osteoporosis

D. Improve supervised learning.

New cards

Which of the following statements about Random Upsampling is TRUE?

A. Random Upsampling preserves all original observations.

B. Random Upsampling will generally lead to a lower F1 score.

C. Random Upsampling generates observations that were not part of the original data.

D. Random Upsampling results in excessive focus on the more frequently-occurring class.

New cards

What is artificial intelligence?

A. A subset of deep learning.

B. Any program that can sense, reason, act, and adapt.

C. A subset of machine learning

D. None of the others

New cards

Which of the following statements correctly defines the strengths of the DBSCAN algorithm?

A. No need to specify the number of clusters (cf. K-means), allows for noise, and can handle arbitrary-shaped clusters.

B. Do well with different density, works with just one parameter, the n_clu defines itself.

C. The algorithm will find the outliers first, draw regular shapes, works faster than other algorithms.

D. The algorithm is computationally intensive, it is sensitive to outliers, and it requires few hyperparameters to be tuned.

New cards

What does NoSQL stand for and what does it represent?

A. NoSQL stands for Not-only SQL, and it represents a set of databases that are relational, therefore, they have fixed structure.

B. NoSQL stands for Non-Structured Query Language, and it represents a set of relational databases with fixed schemas.

C. NoSQL stands for Non-Structured Query Language, and it represents a set of non-relational databases with varied schemas.

D. NoSQL stands for Not-only SQL, and it represents a set of databases that are not relational, therefore,they vary in structure.

New cards

Which type of Ensemble modeling approach is NOT a special case of model averaging?

A. Random Forest methods

B. Boosting methods

C. The Pasting method of Bootstrap aggregation

D. The Bagging method of Bootstrap aggregation

New cards

Which statement about the Pandas read csv function is TRUE?

A. It can only read comma-delimited data

B. It can read both tab-delimited and space-delimited data

C. It reads data into a 2-dimensional NumPy array

D. It allows only one argument: the name of the file

New cards

A p-value is:

A. the smallest significance level at which the null hypothesis would be rejected

B. the probability of the null hypothesis being true

C. the probability of the null hypothesis being false

D. the smallest significance level at which the null hypothesis is accepted

New cards

What happen with our second cluster centroid when we use the probability formula?

A. When we use the probability formula, we put less weight on the points that are far away. So, our second cluster centroid is likely going to be closer.

B. When we use the probability formula, we put more weight on the points that are far away. So, our second cluster centroid is likely going to be more distant.

D. When we use the probability formula, we put less weight on the points that are far away. So, our second cluster centroid is likely going to be more distant.

New cards

Which of the following statements about regularization is TRUE?

A. Regularization always reduces the number of selected features.

B. Regularization increases the likelihood of overfitting relative to training data.

C. Regularization decreases the likelihood of overfitting relative to training data.

D. Regularization performs feature selection without a negative impact in the likelihood of overfitting relative to the training data.

New cards

How do we define the core points when we use the DBSCAN algorithm?

A. A point that has more than n_clu neighbors in their E-neighborhood.

B. An E-neighbor point than has fewer than n_clu neighbors itself.

C. A point that has no points in its E-neighborhood.

D. A point that has the same amount of n_clu neighbors within and outside the E-neighborhood

New cards

Usually the first step to fit a gradient boosting classifier model using scikit learn is to:

A. import classifier ensemble from the sklearn. Gradient Boosting module e.g. from sklearn.gradientboosting import ClassifierEnsemble

B. import gradient boosting from the sklearn.classifierensemble module e.g. from sklearn.classifierensemble import GradientBoosting

C. import gradient boosting classifier from the sklearn.ensemble module

e.g. from sklearn.ensemble import GradientBoostingClassifier

D. import classifier from the sklearn.gradientboosting module

e.g. from sklearn.gradientboosting import Classifier

New cards

Select the option that best completes the following sentence:

For data with many features, principal components analysis

A. identifies which features can be safely discarded

B. reduces the number of features without losing any information.

C. establishes a minimum number of viable features for use in the analysis.

D. generates new features that are linear combinations of the original features.

New cards

Usually the first step to fit a logistic regression model using scikit learn is to:

A. import logistic regression from the sklearn.linear_model module

e.g. from sklearn.linear_model import LogisticRegression

B. import Logistic from the sklearn.regression module

e.g. from sklearn.regression import Logistic

C. import Logistic from the sklearn.linear_regression module

e.g. from sklearn.linear_regression import Logistic

D. import logistic regression from the sklearn.linear classifer module

e.g. from sklearn.linearclassifer import Logistic Regression

New cards

Which of the following is an example of a file type that uses Javascript Object Notation JSON) formatting?

(Choose 1 answer)

16/50-CAP

A. Python (py files)

B. Javascript (js files)

C. SQL Database (sql files)

D. Jupyter/iPython (ipynb files)

New cards

Decision trees used as classifiers compute the value assigned to a leaf by calculating the ratio: number of observations of one class divided by the number of observations in that leaf

E.g. number of customers that are younger than 50 years old divided by the total number of customers.

How are leaf values calculated for regression decision trees?

A. average value of the predicted variable

B. median value of the predicted variable

C. mode value of the predicted variable

D. weighted average value of the predicted variable

New cards

This is the best way to choose the number of trees to build on a Bagging ensemble.

A. Choose a large number of trees, typically above 100

B. Prioratize training error metrics over out of bag sample

C. Tune number of trees as a hyperparameter that needs to be optimized

D. Choose a number of trees past the point of diminishing returns

New cards

This is an ensemble model that does not use bootstrapped samples to fit the base trees, takes residuals into account, and fits the base trees iteratively:

A. Random Trees

B. Boosting

C. Bagging

D. Random Forest

New cards

If a low-complexity model is underfitting during estimation, which of the following is MOST LIKELY true (holding the model constant)?

A. K-fold cross-validation will still lead to underfitting, for any k.

B. K-cross-validation with a small k will reduce or eliminate underfitting.

C. K-fold cross-validation with a large k will reduce or eliminate underfitting.

D. None of the above.

New cards

In which case below is it most plausible to conclude that an observation includes an outlier for one of the features?

A. One feature has a deleted residual value above 3.

B. The observation includes the maximum target value.

C. The observation is missing values for several of the features.

D. One feature has an internally-studentized residual value above 3.

New cards

You can use supervised machine learning for all of the following examples, EXCEPT:

A. Segment customers by their demographics.

B. Predict the number of customers that will visit a store on a given week.

C. Predict the probability of a customer returning to a store.

D. Interpret the main drivers that determine if a customer will return to a store.

New cards

Select the option that has the syntax to obtain the data splits you will need to train a model having a test split that is a third the size of your available data.

A. X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.5)

B. X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33)

C. X_train, y_test = train_test_split(x, y, test_size=0.33)

D. X_train, y_test = train_test_split(x, y, test_size=0.5)

New cards

Compared with Lasso regression (assuming similar implementation), Ridge regression is:

A. less likely to overfit to training data.

B. more likely to overfit to training data.

C. less likely to set feature coefficients to zero.

D. more likely to set feature coefficients to zero.

New cards

Which of the following statements is FALSE?

A. In Analytic View, increasing L2/L1 penalties force coefficients to be smaller, restricting their plausible range.

B. Under the Geometric formulation, the cost function minimum is found at the intersection of the penalty boundary and a contour of the traditional OLS cost function surface.

C. Under the Probabilistic formulation, L2 (Ridge) regularization imposes Gaussian prior on the coefficients,while L1 (Lasso) regularization imposes Laplacian prior.

D. None of the others

New cards

The main purpose of scaling features before fitting a k nearest neighbor model is to:

A. Help find the appropriate value of k

B. Ensure decision boundaries have roughly the same size for all classes

C. Ensure that features have similar influence on the distance calculation

D. Break ties in case there is the same number of neighbors of different classes next to a given observation

New cards

This is the type of Machine Learning that uses both data with labeled outcomes and data without labeled outcomes:

A. Supervised Machine Learning

B. Unsupervised Machine Learning

C. Mixed Machine Learning

D. Semi-Supervised Machine Learning

New cards

These are all methods of dealing with unbalanced classes EXCEPT:

A. Downsampling.

B. Mix of in-sample and out-of-sample.

C. Mix of downsampling and upsampling.

D. Upsampling.

New cards

What is an ensemble model that needs you to look at out of bag error?

A. Logistic Regression.

B. Out of Bag Regression

C. Random Forest

D. Stacking

New cards

You are evaluating a binary classifier. There are 50 positive outcomes in the test data, and 100 observations.Using a 50% threshold, the classifier predicts 40 positive outcomes, of which 10 are incorrect.

Increasing the threshold to 60% results in 5 additional positive predictions, all of which are correct. Which of the following statements about this new model (compared with the original model that had a 50% threshold) is TRUE?

A. The F1 score of the classifier would decrease.

B. The area under the ROC curve would decrease.

C. The F1 score of the classifier would remain the same.

D. The area under the ROC curve would remain the same.

New cards

Which of the following statements is a characteristic of the DBSCAN algorithm?

A. Can handle tons of data and weird shapes.

B. Finds uneven cluster sizes (one is big, some are tiny).

C. It will do a great performance finding many clusters.

D. It will do a great performance finding few clusters.

New cards

Which of the following statements correctly defines the weaknesses of the DBSCAN algorithm?

A. The clusters it find might not be trustworthy, it needs noisy data to work, and it can't handle subgroups.

B. It needs two parameters as input, finding appropriate values of & and n_clu can be difficult, and it does not do well with clusters of different density.

C. The algorithm will find the outliers first, it draws regular shapes, and it works faster than other algorithms.

D. The algorithm is computationally intensive, it is sensitive to outliers, and it requires too many hyperparameters to be tuned.

New cards

What is the classifier's Recall on the test sample?

Α. 66.7%

B. 37.5%

C. 50%

D. 33.%

New cards

Complete the following sentence: The training data is used to fit the model, while the test data is used to:

A. measure the parameters and hyperparameters of the model

B. tweak the model hyperparameters

C. tweak the model parameters

D. measure error and performance of the model

New cards

Which of the following statements is a characteristic of the Mean Shift algorithm?

A. Not require us to set the number of clusters, the number of clusters will be determined for us.

B. Bad with non-spherical cluster shapes.

C. You need to decide the number of clusters on your own, choosing the numbers directly or the minimum distance threshold.

D. Good with non-spherical cluster shapes.

New cards

Which statement about unsupervised algorithms is TRUE?

A. Unsupervised algorithms are relevant when we have outcomes we are trying to predict.

B. Unsupervised algorithms are relevant when we don't have the outcomes we are trying to predict and when we want to break down our data set into smaller groups.

C. Unsupervised algorithms are typically used to forecast time related patterns like stock market trends or sales forecasts.

D. Unsupervised algorithms are relevant in cases that require explainability, for example comparing parameters from one model to another.

New cards

Which of the following statements is a characteristic of the K-means algorithm?

A. It is limited to use the Euclidean distance within his formulation.

B. To determine the number of clusters we use the elbow method.

C. Can be slow to calculate as the number of observations increases.

D. It's limited to use the Ward distance within his formulation.

New cards

What are the two main data problems companies face when getting started with artificial intelligence/machine learning?

A. Data sampling and categorization

B. Outliers and duplicated data

C. Lack of relevant data and bad data

D. Lack of training and expertise

New cards

What is the main goal of adding polynomial features to a linear regression?

A. Remove the linearity of the regression and turn it into a polynomial model.

B. Capture the relation of the outcome with features of higher order.

C. Increase the interpretability of a black box model.

D. Ensure similar results across all folds when using K-fold cross validation.

New cards

Which statement about Support Vector Machines is TRUE?

A. Support Vector Machine models are non-linear.

B. Support Vector Machine models rarely overfit on training data if using regularization.

C. Support Vector Machine models can be used for classification but not for regression.

D. Support Vector Machine models can be used for regression but not for classification.

New cards

What are polynomial features?

A. They are higher order relationships in the data.

B. They are logistic regression coefficients.

C. They are lower order relationships in the data.

D. They are represented by linear relationships in the data.

New cards

These are all characteristics of decision trees, EXCEPT:

A. They segment data based on features to predict results

B. They split nodes into leaves

C. They can be used for either classification or regression

D. They have well rounded decision boundaries

New cards

Consider the following statements about Estimation and Inference:

a. In general, the population parameters are unknown.

b. Parametric models have finite number of parameters.

A. a and b are true

B. a and b are false

C. a is true and b is false

D. a is false and b is true

New cards

These are all characteristics of the k nearest neighbors algorithm EXCEPT:

A. It is sensitive to scaling

B. It determines decision boundaries to make predictions

C. It determines the value for k

D. It is well suited to predict variables with multiple classes

New cards

What is the most common sklearn methods to add polynomial features to your data?

A. polyFeat.add and polyFeat.transform

B. polyFeat.add and polyFeat.fit

C. polyFeat.fit and polyFeat.transform

D. polyFeat.transform

New cards

Which distance metric is usefulwhen we have text documents and we want to group similar topics together?

A. Manhattan Distance

B. Euclidean

C. Jaccard

D. Mahalanobis Distance

New cards

This is the syntax you need to predict new data after you have trained a linear regression called LR:

A. LR-predict(X_test)

B. LR.predict(X_test)

C. LR.predict(LR, X_test)

D. predict(LR, X_test)

100

New cards

This option describes a way of turning a regression problem for a classification problem:

A. Create a new variable that flags 1 for above a certain value and 0 otherwise

B. Use outlier treatment

C. Use missing value handling

D. Create a new variable that uses autoencoding to transform a continuous outcome into categorical