1/279
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
(True/False) Machine Learning is a subset of Artificial Intelligence
A: False
B: True
True
(True/False) Deep Learning is a subset of Machine Learning
A: False
B: True
True
(True/False) Machine Learning consists in programming computers to learn from real-time human interactions
A: False
B: True
False
(True/False) AI Winters happened mostly due to the lack of understanding behind the theory of neural networks
A: True
B: False
True
Most modern applications that use computer vision, use models that were trained using this discipline:
A: Machine Learning
B: Artificial Intelligence
C: Deep Learning
Deep Learning
In the Machine Learning Workflow, the main goal of the Data Exploration and Preprocessing step is to:
A: Identify what data that is best suited to find a solution to your business problem
B: Determine how to clean your data such that you can use it to train a model
Determine how to clean your data such that you can use it to train a model
What is the goal of supervised learning?
A: Predict the labels.
B: Find the target.
C: Find an underlying structure of the dataset without any labels.
D: Predict the features.
Predict the labels.
What is deep learning?
A: Deep learning is machine learning that involves deep neural networks.
B: Deep learning is another name for artificial intelligence.
C: Deep learning includes artificial intelligence and machine learning.
D: None of the above are correct.
Deep learning is machine learning that involves deep neural networks.
When is a standard machine learning algorithm usually a better choice than using deep learning to get the job done?
A: When working with small data sets.
B: When the data is steady over time.
C: When working with large data sets.
D: None of the above are correct.
When working with small data sets.
What is a Turing test?
A: It tests images.
B: It tests and cleans the dataset.
C: It tests the dataset.
D: It tests a machine's ability to exhibit intelligent behavior.
It tests a machine's ability to exhibit intelligent behavior.
What are some of the different milestones in deep learning history?
A: Geoffrey Hinton's work, AlexNet, and TensorFlow
B: Deep Blue defeats a world champion chess player and TensorFlow is released
C: Deep Blue defeats a world champion chess player, and AlexNet is created.
D: Deep Blue defeats a world champion chess player, and Keras is released.
Geoffrey Hinton's work, AlexNet, and TensorFlow
What is artificial intelligence?
A: A subset of deep learning.
B: Any program that can sense, reason, act, and adapt.
C: A subset of machine learning
D: None of the above.
Any program that can sense, reason, act, and adapt.
What are two spaces within AI that are going through drastic growth and innovation?
A: Language processing and deep learning.
B: Deep learning and machine learning.
C: Computer vision and natural language processing.
D: Computer vision and deep learning.
Computer vision and natural language processing.
Why did AI flourish so much in the last years?
A: Faster and inexpensive computers and data storage
B: Access to hardware for cleaning data
C: Stylish designed computers
D: Data storage in the cloud is much more expensive
Faster and inexpensive computers and data storage
How does Alexa use artificial intelligence?
A: Recognizes faces and pictures.
B: Recognizes our voice and answers questions.
C: Suggests who a person on a photo is.
D: None of the above answers are correct.
Recognizes our voice and answers questions.
What are the first two steps of a typical machine learning workflow?
A: Problem statement and data cleaning.
B: Problem statement and data collection.
C: Data collection and data transformation.
D: None of the above answers is correct.
Problem statement and data collection.
Which statement about the Pandas read_csv function is TRUE?
A: It reads data into a 2-dimensional NumPy array.
B: It can read both tab-delimited and space-delimited data.
C: It can only read comma-delimited data.
D: It allows only one argument: the name of the file.
It can read both tab-delimited and space-delimited data.
Which of the following is a reason to use JavaScript Object Notation (JSON) files for storing data?
A: Because the data is stored in a matrix format.
B: Because they can store NA values.
C: Because they can store NULL values.
D: Because they are cross-platform compatible.
Because they are cross-platform compatible.
The data below appears in 'data.txt', and Pandas has been imported. Which Python command will read it correctly into a Pandas DataFrame?
63.03 22.55 39.61 40.48 98.67 -0.25 AB
39.06 10.06 25.02 29 114.41 4.56 AB
68.83 22.22 50.09 46.61 105.99 -3.53 AB
A: pandas.read_csv('data.txt')
B: pandas.read_csv('data.txt', header=None, sep=' ')
C: pandas.read_csv('data.txt', delim_whitespace=True)
D: pandas.read_csv('data.txt', header=0, delim_whitespace=True)
pandas.read_csv('data.txt', header=None, sep=' ')
(True/False) Outliers must be very extreme to noticeably impact the fit of a statistical model.
A: True
B: False
False
(True/False) Outliers should always be replaced, since they never contain useful information about the data.
A: True
B: False
False
Which residual-based approach to identifying outliers compares running a model with all data to running the same model, but dropping a single observation?
A: Standardized residuals
B: Unstandardized residuals
C: Externally-studentized residuals
D: Abnormally-studentized residuals
Externally-studentized residuals
What is a CSV file?
A: CSV is a method of JavaScript Object Notation.
B: CSV files are rows of data or values separated by commas.
C: CSV makes data readily available for analytics, dashboards, and reports.
D: CSV files are a standard way to store data across platforms.
CSV files are rows of data or values separated by commas.
What are residuals?
A: Residuals are a method for handling identified outliers.
B: Residuals are the difference between the actual values and the values predicted by a given model.
C: Residuals are data removed from the dataframe.
D: Residuals are a method to standardize data.
Residuals are the difference between the actual values and the values predicted by a given model.
If removal of rows or columns of data is not an option, why must we ensure that information is assigned for missing data?
A: Information must be assigned to prevent outliers.
B: Most models will not accept blank values in our data.
C: Missing data may bias the dataset.
D: Assigning information for missing data improves the accuracy of the dataset.
Most models will not accept blank values in our data.
What are the two main data problems companies face when getting started with artificial intelligence/machine learning?
A: Outliers and duplicated data
B: Lack of relevant data and bad data
C: Lack of training and expertise
D: Data sampling and categorization
Lack of relevant data and bad data
What does SQL stand for and what does it represent?
A: SQL stands for Structured Query Language, and it represents databases that are not relational, they vary in structure.
B: SQL stands for Sequential Query Language, and it represents a set of relational databases with fixed schemas.
C: SQL stands for Structured Query Language, and it represents a set of relational databases with fixed schemas.
D: SQL stands for Sequential Query Language, and it represents a set of sequential databases with fixed schemas.
SQL stands for Structured Query Language, and it represents a set of relational databases with fixed schemas.
What does NoSQL stand for and what does it represent?
A: NoSQL stands for Non-Structured Query Language, and it represents a set of non-relational databases with varied schemas.
B: NoSQL stands for Not-only SQL, and it represents a set of databases that are not relational, therefore, they vary in structure.
C: NoSQL stands for Non-Structured Query Language, and it represents a set of relational databases with fixed schemas.
D: NoSQL stands for Not-only SQL, and it represents a set of databases that are relational, therefore, they have fixed structure.
NoSQL stands for Not-only SQL, and it represents a set of databases that are not relational, therefore, they vary in structure.
What is a JSON file?
A: JSON stands for JavaString Object Notation, and they have very similar structure to Python Dictionaries.
B: JSON stands for JavaScript Object Notation, and it is a standard way to store the data across platforms.
C: JSON stands for JavaScript Object Notation, and it is a non-standard way to store the data across platforms.
D: JSON stands for JavaString Object Notation, and it is a standard way to store the data across platforms.
JSON stands for JavaScript Object Notation, and it is a standard way to store the data across platforms.
What is meant by the Messy Data?
A: Duplicated or unnecessary data.
B: Inconsistent text and typos.
C: Missing data.
D: All of the above.
All of the above.
What is an outlier?
A: Outlier is a data point that has the highest or lowest value in the dataset.
B: Outlier is a data point that does not belong in our dataset.
C: Outlier is a data point that is very close to the mean value of all observations.
D: Outlier is an observation in dataset that is distant from most other observations.
Outlier is an observation in dataset that is distant from most other observations.
How do we identify outliers in our dataset?
A: We can only identify outliers visually through building plots.
B: We can identify outliers only by calculating the minimum and maximum values in the dataset.
C: We can identify outliers both visually and with statistical calculations.
D: We can only identify outliers by using some statistical calculations.
We can identify outliers both visually and with statistical calculations.
From the options listed below, select the option that is NOT a valid exploratory data approach to visually confirm whether your data is ready for modeling or if it needs further cleaning or data processing:
A: Create a panel plot that shows distributions for the dependent variable and scatter plots for all independent variables
B: Train a model and identify the observations with the largest residuals
C: Create visualizations for scatter plots, histograms, box plots, and hexbin plots
D: Create a correlation heatmap to confirm the sign and magnitude of correlation across your features.
Create a correlation heatmap to confirm the sign and magnitude of correlation across your features.
These are two of the most common variables for data visualization:
A: matplotlib and seaborn
B: scipy and seaborn
C: numpy and matplotlib
D: scipy and numpy
matplotlib and seaborn
(True/False) You can use the pandas library to use plots.
A: True
B: False
True
(True/False) Classification models require that input features be scaled.
A: True
B: False
False
(True/False) Feature scaling allows better interpretation of distance-based approaches.
A: True
B: False
True
(True/False) Feature scaling reduces distortions caused by variables with different scales.
A: True
B: False
True
Which scaling approach converts features to standard normal variables?
A: MinMax scaling
B: Standard scaling
C: Robust scaling
D: Nearest neighbor scaling
Standard scaling
Which variable transformation should you use for ordinal data?
A: Min-max scaling
B: Standard scaling
C: One-hot encoding
D: Ordinal encoding
Ordinal encoding
What are polynomial features?
A: They are higher order relationships in the data.
B: They are represented by linear relationships in the data.
C: They are logistic regression coefficients.
D: They are lower order relationships in the data.
They are higher order relationships in the data.
What does Boxcox transformation do?
A: It transforms categorical variables into numerical variables.
B: It makes the data more left skewed
C: It transforms the data distribution into more symmetrical bell curve
D: It makes the data more right skewed.
It transforms the data distribution into more symmetrical bell curve
Select three important reasons why EDA is useful.
A: To determine if the data makes sense, to determine whether further data cleaning is needed, and to help identify patterns and trends in the data
B: To analyze data sets, to determine the main characteristics of data sets, and to use sampling to examine data
C: To examine correlations, to sample from dataframes, and to train models on random samples of data
D: To utilize summary statistics, to create visualizations, and to identify outliers
To determine if the data makes sense, to determine whether further data cleaning is needed, and to help identify patterns and trends in the data
What assumption does the linear regression model make about data?
A: This model assumes an addition of each one of the model parameters multiplied by a coefficient.
B: This model assumes that raw data in data sets is on the same scale.
C: This model assumes a transformation of each parameter to a linear relationship.
D: This model assumes a linear relationship between predictor variables and outcome variables.
This model assumes a linear relationship between predictor variables and outcome variables.
What is skewed data?
A: Data that has a normal distribution.
B: Raw data that may not have a linear relationship.
C: Raw data that has undergone log transformation.
D: Data that is distorted away from normal distribution; may be positively or negatively skewed.
Data that is distorted away from normal distribution; may be positively or negatively skewed.
Select the two primary types of categorical feature encoding.
A: Log and polynomial transformation
B: Nominal encoding and ordinal encoding
C: Encoding and scaling
D: One-hot encoding and ordinal encoding
One-hot encoding and ordinal encoding
Which scaling approach puts values between zero and one?
A: Min-max scaling
B: Robust scaling
C: Standard scaling
D: Nearest neighbor scaling
Min-max scaling
Which variable transformation should you use for nominal data with multiple different values within the feature?
A: Ordinal encoding
B: Standard scaling
C: One-hot encoding
D: Min-max scaling
One-hot encoding
(True/False) In general, the population parameters are unknown.
A: True.
B: False.
True.
(True/False) Parametric models have finite number of parameters.
A: True.
B: False.
True.
The most common way of estimating parameters in a parametric model is:
A: using the maximum likelihood estimation
B: using the central limit theorem
C: extrapolating a non-parametric model
D: extrapolating Bayesian statistics
using the maximum likelihood estimation
A p-value is:
A: the smallest significance level at which the null hypothesis would be rejected
B: the probability of the null hypothesis being true
C: the probability of the null hypothesis being false
D: the smallest significance level at which the null hypothesis is accepted
the smallest significance level at which the null hypothesis would be rejected
Type 1 Error 1 is defined as:
A: Saying the null hypothesis is false, when it is actually true
B: Saying the null hypothesis is true, when it is actually false
Saying the null hypothesis is false, when it is actually true
You find through a graph that there is a strong correlation between Net Promoter Score and the visual time that customers spend on a website. Select the TRUE assertion:
A: There is an underlying factor that explains this correlation, but manipulating the time that customers spend on a website may not affect the Net Promoter Score they will give to the company
B: To boost the Net Promoter Score of a business, you need to increase the time that customers spend on a website.
There is an underlying factor that explains this correlation, but manipulating the time that customers spend on a website may not affect the Net Promoter Score they will give to the company
Which one of the following is common to both machine learning and statistical inference?
A: Using sample data to make inferences about a hypothesis.
B: Using population data to make inferences about a null sample.
C: Using population data to model a null hypothesis.
D: Using sample data to infer qualities of the underlying population distribution.
Using sample data to infer qualities of the underlying population distribution.
Which one of the following describes an approach to customer churn prediction stated in terms of probability?
A: Data related to churn may include the target variable for whether a certain customer has left.
B: Churn prediction is a data-generating process representing the actual joint distribution between our x and the y variable.
C: Predicting a score for individuals that estimates the probability the customer will stay.
D: Predicting a score for individuals that estimates the probability the customer will leave.
Predicting a score for individuals that estimates the probability the customer will leave.
What is customer lifetime value?
A: The total purchases over the time which the person is a customer.
B: The total churn a customer generates in the population.
C: The total churn generated by a customer over their lifetime.
D: The total value that the customer receives during their life.
The total purchases over the time which the person is a customer.
Which one the following statements about the normalized histogram of a variable is true?
A: It is a non-parametric representation of the population variance.
B: It provides an estimate of the variable's probability distribution.
C: It serves as a bar chart for the null hypothesis.
D: It is a parametric representation of the population distribution.
It provides an estimate of the variable's probability distribution.
The outcome of rolling a fair die can be modelled as a _______ distribution.
A: Poisson
B: log-normal
C: uniform
D: normal
uniform
Which one of the following features best distinguishes the Bayesian approach to statistics from the Frequentist approach?
A: Frequentist statistics incorporates the probability of the hypothesis being true.
B: Bayesian statistics incorporate the probability of the hypothesis being true.
C: Frequentist statistics requires construction of a prior distribution.
D: Bayesian statistics is better than Frequentist.
Bayesian statistics incorporate the probability of the hypothesis being true.
Which of the following best describes what a hypothesis is?
A: A hypothesis is a statement about a posterior distribution.
B: A hypothesis is a statement about a prior distribution.
C: A hypothesis is a statement about a population.
D: A hypothesis is a statement about a sample of the population.
A hypothesis is a statement about a population.
A Type 2 error in hypothesis testing is _____________________:
A: correctly rejecting the alternative hypothesis.
B: incorrectly accepting the null hypothesis.
C: correctly rejecting the null hypothesis.
D: incorrectly accepting the alternative hypothesis.
incorrectly accepting the null hypothesis.
Which statement best describes a consequence of a type II error in the context of a churn prediction example? Assume that the null hypothesis is that customer churn is due to chance, and that the alternative hypothesis is that customers enrolled for greater than two years will not churn over the next year.
A: You correctly conclude that a customer will eventually churn
B: You correctly conclude that customer churn is by chance
C: You incorrectly conclude that there is no effect
D: You incorrectly conclude that customer churn is by chance
You incorrectly conclude that customer churn is by chance
Which of the following is a statistic used for hypothesis testing?
A: The acceptance region.
B: The standard deviation.
C: The likelihood ratio.
D: The rejection region.
The likelihood ratio.
Predicting payment default, whether a transaction is fraudulent, and whether a customer will be part of the top 5% spenders on a given year, are examples of:
A: classification
B: regression
classification
(True/False) It is less concerning to treat a Machine Learning model as a black box for prediction purposes, compared to interpretation purposes:
A: True
B: False
True
Predicting total revenue, number of customers, and percentage of returning customers are examples of:
A: classification
B: regression
regression
(True/False) The Sum of Squared Errors (SSE) can be used to select the best-fitting regression model.
A: True
B: False
True
(True/False) The R-squared value from estimating a linear regression model will almost always increase if more features are added.
A: True
B: False
True
(True/False) The Total Sum of Squares (TSS) can be used to select the best-fitting regression model.
A: True
B: False
False
You can use supervised machine learning for all of the following examples, EXCEPT:
A: Segment customers by their demographics.
B: Predict the number of customers that will visit a store on a given week.
C: Predict the probability of a customer returning to a store.
D: Interpret the main drivers that determine if a customer will return to a store.
Segment customers by their demographics.
The autocorrect on your phone is an example of:
A: Unsupervised learning
B: Supervised learning
C: Semi-supervised learning
D: Reinforcement learning
Supervised learning
This is the type of Machine Learning that uses both data with labeled outcomes and data without labeled outcomes:
A: Supervised Machine Learning
B: Unsupervised Machine Learning
C: Mixed Machine Learning
D: Semi-Supervised Machine Learning
Semi-Supervised Machine Learning
This option describes a way of turning a regression problem into a classification problem:
A: Create a new variable that flags 1 for above a certain value and 0 otherwise
B: Use outlier treatment
C: Use missing value handling
D: Create a new variable that uses autoencoding to transform a continuous outcome into categorical
Create a new variable that flags 1 for above a certain value and 0 otherwise
This is the syntax you need to predict new data after you have trained a linear regression model called LR :
A: LR=predict(X_test)
B: LR.predict(X_test)
C: LR.predict(LR, X_test)
D: predict(LR, X_test)
LR.predict(X_test)
All of these options are useful error measures to compare regressions except:
A: SSE
B: R squared
C: TSS
D: ROC index
ROC index
All of the listed below are part of the Machine Learning Framework, except:
A: Observations
B: Features
C: Parameters
D: None of the above
None of the above
Select the option that is the most INACCURATE regarding the definition of Machine Learning:
A: Machine Learning allows computers to learn from data
B: Machine Learning allows computers to infer predictions for new data
C: Machine Learning is a subset of Artificial Intelligence
D: Machine Learning is automated and requires no programming
Machine Learning is automated and requires no programming
In Linear Regression, which statement about model evaluation is the most accurate?
A: Model selection involves choosing a model that minimizes the cost function.
B: Model estimation involves choosing parameters that minimize the cost function.
C: Model estimation involves choosing a cost function that can be compared across models.
D: Model selection involves choosing modeling parameters that minimize in-sample validation error.
Model estimation involves choosing parameters that minimize the cost function.
When learning about regression we saw the outcome as a continuous number. Given the below options what is an example of regression?
A: A fraudulent charge
B: Under certain circumstances determine if a person is a Republican or Democrat
C: Customer churn
D: Housing prices
Housing prices
What is another term for the testing data:
A: Training data
B: Unseen data
C: Corroboration data
D: Cross validation data
Unseen data
(True/False) The ShuffleSplit will ensure that there is no bias in your outcome variable.
A: True
B: False
True
Select the option that has the syntax to obtain the data splits you will need to train a model having a test split that is a third the size of your available data.
A: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)
B: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
C: X_train, y_test = train_test_split(X, y, test_size=0.33)
D: X_train, y_test = train_test_split(X, y, test_size=0.5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
What is the main goal of adding polynomial features to a linear regression?
A: Remove the linearity of the regression and turn it into a polynomial model.
B: Capture the relation of the outcome with features of higher order.
C: Increase the interpretability of a black box model.
D: Ensure similar results across all folds when using K-fold cross validation.
Capture the relation of the outcome with features of higher order.
What is the most common sklearn methods to add polynomial features to your data?
Note: polyFeat = PolynomialFeatures(degree)
A: polyFeat.add and polyFeat.transform
B: polyFeat.add and polyFeat.fit
C: polyFeat.fit and polyFeat.transform
D: polyFeat.transform
polyFeat.fit and polyFeat.transform
How can you adjust the standard linear approach to regression when dealing with fundamental problems such as prediction or interpretation?
A: Create a class instance
B: Add some non-linear patterns, i.e., polynomial features
C: Import the transformation method
D: By transforming the data
Add some non-linear patterns, i.e., polynomial features
The main purpose of splitting your data into a training and test sets is:
A: To improve accuracy
B: To avoid overfitting
C: To improve regularization
D: To improve crossvalidation and overfitting
To avoid overfitting
Complete the following sentence: The training data is used to fit the model, while the test data is used to:
A: measure the parameters and hyperparameters of the model
B: tweak the model hyperparameters
C: tweak the model parameters
D: measure error and performance of the model
measure error and performance of the model
What term is used if your test data leaks into the training data?
A: Test leakage
B: Training leakage
C: Data leakage
D: Historical data leakage
Data leakage
Which one of the below terms use a linear combination of features?
A: Binomial Regression
B: Linear Regression
C: Multiple Regression
D: Polynomial Regression
Linear Regression
When splitting your data, what is the purpose of the training data?
A: Compare with the actual value
B: Fit the actual model and learn the parameters
C: Predict the label with the model
D: Measure errors
Fit the actual model and learn the parameters
Polynomial features capture what effects?
A: Non-linear effects.
B: Linear effects.
C: Multiple effects.
D: Regression effects.
Non-linear effects.
Which fundamental problems are being solved by adding non-linear patterns, such as polynomial features, to a standard linear approach?
A: Prediction.
B: Interpretation.
C: Prediction and Interpretation.
D: None of the above.
Prediction and Interpretation.
A testing data could be also reffered to as:
A: Training data
B: Unseen data
C: Corroboration data
D: None of the above
Unseen data
Select the correct syntax to obtain the data split that will result in a train set that is 60% of the size of your available data.
A: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.6)
B: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)
C: X_train, y_test = train_test_split(X, y, test_size=0.40)
D: X_train, y_test = train_test_split(X, y, test_size=0.6)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)
What is the correct sklearn syntax to add a third degree polynomial to your model?
A: polyFeat = polyFeat.add(degree=3)
B: polyFeat = polyFeat.fit(degree=3)
C: polyFeat = PolynomialFeatures(degree=3)
D: polyFeat = polyFeat.transform(degree=3)
polyFeat = PolynomialFeatures(degree=3)
(True/False) In model complexity versus error diagram, the model compexity increases as the training error decreases.
A: True
B: False
False
(True/False) In model complexity versus error diagram, there is an inflection point after which, as the cross validatio error increases, so does the complexity of the model.
A: True
B: False
True
(True/False) In the model complexity versus error diagram, the right side of the curve is where the model is underfitted and the left side of the curve, is where the model is overfitted.
A: True
B: False
False
In K-fold cross-validation, how will increasing k affect the variance (across subsamples) of estimated model parameters?
A: Increasing k will not affect the variance of estimated parameters.
B: Increasing k will usually reduce the variance of estimated parameters.
C: Increasing k will usually increase the variance of estimated parameters.
D: Increasing k will increase the variance of estimated parameters if models are underfit, but reduce it if models are overfit.
Increasing k will usually increase the variance of estimated parameters.