1/57
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
1. (True/False) Al Winters happened mostly due to the lack of understanding behind the theory of neural networks
A True
B False
B False
- Al Winters were mostly due to scalability issues related to data size and computing power.
2. Most modern applications that use computer vision, use models that were trained using this discipline:
A Machine Learning
B Artificial Intelligence
C Deep Learning
C Deep Learning
3. In the Machine Learning Workflow, the main goal of the Data Exploration and Preprocessing step is to:
A Identify what data that is best suited to find a solution to your business problem
B Determine how to clean your data such that you can use it to train a model
B Determine how to clean your data such that you can use it to train a model
1. What is the goal of supervised learning?
A Find an underlying structure of the dataset without any labels.
B Find the target.
C Predict the features.
D Predict the labels.
D Predict the labels.
- The goal for supervised learning is to be able to predict the label.
2. What is deep learning?
A Deep learning is machine learning that involves deep neural networks
B Deep learning is another name for artificial intelligence.
C Deep learning includes artificial intelligence and machine learning.
D None of the above are correct.
A Deep learning is machine learning that involves deep neural networks
- Deep learning is machine learning that involves using very complicated models called deep neural networks. Deep learning is a subset of machine learning.
3. When is a standard machine learning algorithm usually a better choice than using deep learning to get the job done?
A When working with small data sets.
B When the data is steady over time.
C When working with large data sets.
D None of the above are correct.
A When working with small data sets.
- A standard machine learning algorithm is a better choice when you are working with smaller datasets, and if the data is changing a lot over time and you don't have a steady dataset.
4. What is a Turing test?
A lt tests a machine's ability to exhibit intelligent behavior.
B It tests the dataset.
C It tests and cleans the dataset.
D It tests images.
A lt tests a machine's ability to exhibit intelligent behavior.
- In 1950, Alan Turing developed the Turing test to test a machine's ability to exhibit intelligent behavior. Alan Turing's test has served as a foundational threshold for artificial intelligence.
5. What are some of the different milestones in deep learning history?
A Deep Blue defeats a world champion chess player, and AlexNet is created.
B Geoffrey Hinton's work, AlexNet, and TensorFlow
C Deep Blue defeats a world champion chess player and TensorFlow is released
D Deep Blue defeats a world champion chess player, and Keras is released.
B Geoffrey Hinton's work, AlexNet, and TensorFlow
- In 2006, the previous limitations of deep learning, namely exploding and vanishing gradients were overcome with algorithmic advancements such as Geoffrey Hinton's work on unsupervised pre-training. Neural networks are rebranded as deep learning, as we are able to train much deeper networks, networks with more layers; In 2012, a deep learning model using convolutional neural nets called AlexNet achieved a top five error of 15.3 percent; In 2015, one of the most popular libraries, TensorFlow, was built for deep learning, making it more powerful and accessible.
6. What is artificial intelligence?
A A subset of deep learning.
B Any program that can sense, reason, act, and adapt.
C A subset of machine learning
D None of the above.
B Any program that can sense, reason, act, and adapt.
- Artificial intelligence is any program that can sense, reason, act, and adapt. It is essentially a machine taking any form of intelligent behavior.
7. What are two spaces within Al that are going through drastic growth and innovation?
A Computer vision and natural language processing.
B Deep learning and machine learning.
C Language processing and deep learning.
D Computer vision and deep learning.
A Computer vision and natural language processing.
- In two spaces we are seeing drastic growth and innovation, computer vision and natural language processing. The sharp advancements in computer vision are impacting multiple areas. For example, cars getting to the point where they can drive themselves. Similarly, natural language processing is booming with vast improvements in ability to translate, determine sentiment, clustering of new articles, writing papers, and many others.
8. Why did Al flourish so much in the last years?
A Data storage in the cloud is much more expensive
B Access to hardware for cleaning data
C Stylish designed computers
D Faster and inexpensive computers and data storage
D Faster and inexpensive computers and data storage
- AI flourished in the last years with the cloud infrastructure now in place to store large amounts of data for much cheaper, and the plethora of new ways to capture data are now able to build larger, more new once datasets to learn underlying patterns across a multitude of fields. We also have faster computers, and we now have access to powerful hardware for processing and storing data.
9. How does Alexa use artificial intelligence?
A Recognizes faces and pictures.
B Recognizes our voice and answers questions.
C Suggests who a person on a photo is.
D None of the above answers are correct.
B Recognizes our voice and answers questions.
- Alexa, in our homes, recognizes our voice and answers questions or does tasks for us through natural language processing.
10. What are the first two steps of a typical machine learning workflow?
A Problem statement and data cleaning.
B Problem statement and data collection.
C Data collection and data transformation.
D None of the above answers is correct.
B Problem statement and data collection.
- The first step of a typical machine learning workflow is the problem statement. What problem are you trying to solve? The second step is data collection. What data do you need to solve the problem?
1. Which statement about the Pandas read_csv function is TRUE?
A It can read both tab-delimited and space-delimited data.
B Itcan or|ly read comma-delimited data.
C It reads data into a 2-dimensional NumPy array.
D It allows only one argument: the name of the file.
A It can read both tab-delimited and space-delimited data.
2. Which of the following is a reason to use JavaScript Object Notation (JSON) files for storing data?
A Because the data is stored in a matrix format.
B Becausethey can store NA values.
C Becausethey can store NULL values.
D Because they are cross-platform compatible.
D Because they are cross-platform compatible.
3. The data below appears in 'data.txt', and Pandas has been imported. Which Python command will read it correctly into a Pandas Data Frame?
63.03 22.55 39.61 40.48 98.67 -0.25 AB
39.06 10.06 25.02 29 114.41 4.56 AB
68.83 22.22 50.09 46.61 105.99 -3.53 AB
A pandas.read_csv('data.txt')
B pandas.read_csv('data.txt', header=None, sep='')
C pandas.read_csv('data.txt', delim_whitespace=True)
D pandas.read_csv('data.txt', header=0, delim_whitespace=True)
B pandas.read_csv('data.txt', header=None, sep='')
1. What is a CSV file?
A CSV is a method of JavaScript Object Notation.
B CSV files are a standard way to store data across platforms.
C CSV makes data readily available for analytics, dashboards, and reports.
D CSV files are rows of data or values separated by commas.
D CSV files are rows of data or values separated by commas.
- CSV, or Comma Separated Value, files are rows of data or values separated by commas.
2. What are residuals?
A Residuals are data removed from the dataframe.
B Residuals are a method for handling identified outliers.
C Residuals are a method to standardize data.
D Residuals are the difference between the actual values and the values predicted by a given model.
D Residuals are the difference between the actual values and the values predicted by a given model.
- Residuals are model prediction errors.
3. If removal of rows or columns of data is not an option, why must we ensure that information is assigned for missing data?
A Most models will not accept blank values in our data.
B Missing data may bias the dataset.
C Assigning information for missing data improves the accuracy of the dataset.
D Information must be assigned to prevent outliers.
A Most models will not accept blank values in our data.
- Information must be given for every feature and label in a dataset.
4. What are the two main data problems companies face when getting started with artificial intelligence/machine learning?
A Lack of relevant data and bad data
B Data sampling and categorization
C Outliers and duplicated data
D Lack of training and expertise
A Lack of relevant data and bad data
- Companies need to collect and organize their data to make it ready before leveraging it for machine learning.
5. What does SQL stand for and what does it represent?
A SQL stands for Structured Query Language, and it represents a set of relational databases with fixed schemas.
B SQL stands for Structured Query Language, and it represents databases that are not relational, they vary in structure.
C SQL stands for Sequential Query Language, and it represents a set of sequential databases with fixed schemas.
D SQL stands for Sequential Query Language, and it represents a set of relational databases with fixed schemas.
A SQL stands for Structured Query Language, and it represents a set of relational databases with fixed schemas.
- SQL is the set of highly structured relational databases with fixed schema.
6. What does NoSQL stand for and what does it represent?
A NoSQL stands for Non-Structured Query Language, and it represents a set of relational databases with fixed schemas.
B NoSQL stands for Non-Structured Query Language, and it represents a set of non-relational databases with varied schemas.
C NoSQL stands for Not-only SQL, and it represents a set of databases that are relational, therefore, they have fixed structure.
D NoSQL stands for Not-only SQL, and it represents a set of databases that are not relational, therefore, they vary in structure.
D NoSQL stands for Not-only SQL, and it represents a set of databases that are not relational, therefore, they vary in structure.
7. What ¡S a JSON file?
A JSON stands for JavaScript Object Notation, and it is a non-standard way to store the data across platforms.
B JSON stands for JavaScript Object Notation, and it is a standard way to store the data across platforms.
C JSON stands for JavaString Object Notation, and it is a standard way to store the data across platforms.
D JSON stands for JavaString Object Notation, and they have very similar structure to Python Dictionaries.
B JSON stands for JavaScript Object Notation, and it is a standard way to store the data across platforms.
- JSON stands for JavaScript Object Notation, and those files are going to be a standard way to store data across platforms.
8. What is meant by the Messy Data?
A Duplicated or unnecessary data.
B Inconsistent text and typos.
C Missing data.
D All of the above.
D All of the above.
- Duplicated or unnecessary data, inconsistent text and typos, and missing data are all examples of the messy data.
9. What is an outlier?
A Outlier is a data point that has the highest or lowest value in the dataset.
B Outlier is an observation in dataset that is distant from most other observations.
C Outlier is a data point that does not belong in our dataset.
D Outlier is a data point that is very close to the mean value of all observations.
B Outlier is an observation in dataset that is distant from most other observations.
- An outlier is an observation in data that is distinct from most other observations.
10. How do we identify outliers in our dataset?
A We can identify outliers only by calculating the minimum and maximum values in the dataset.
B We can identify outliers both visually and with statistical calculations.
C We can only identify outliers by using some statistical calculations.
D We can only identify outliers visually through building plots.
B We can identify outliers both visually and with statistical calculations.
- We can use plots, such as histograms, density, and box plots, as well as making some statistical calculations, such as calculating the interquartile ranges.
1. From the options listed below, select the option that is NOT a valid exploratory data approach to visually confirm whether your data is ready for modeling or if it needsfurther cleaning or data processing:
A Create a panel plot that shows distributions for the dependent variable and scatter plots for all independent variables
B Train a model and identify the observations with the largest residuals
C Create visualizations for scatter plots, histograms, box plots, and hexbin plots
D Create a correlation heatmap to confirm the sign and magnitude of correlation across your features.
D Create a correlation heatmap to confirm the sign and magnitude of correlation across your features.
2. These are two of the most common variables for data visualization:
A matplotlib and seaborn
B scipy and seaborn
C numpy and matplotlib
D scipy and numpy
A matplotlib and seaborn
3. (True/False) You can use the pandas library to use plots.
A True
B False
A True
1. (True/False) Classification models require that input features be scaled.
A True
B False
B False
2. (True/False) Feature scaling allows better interpretation of distance-based approaches.
A True
B False
A True
3. (True/False) Feature scaling reduces distortions caused by variables with different scales.
A True
B False
A True
1. Which scaling approach converts features to standard normal variables?
A MinMax scaling
B Robust scaling
C Standard scaling
D Nearest neighbor scaling
C Standard scaling
- Standard scaling converts variables to standard normal variables.
2. Which variable transformation should you use for ordinal data?
A Ordinal encoding
B Standard scaling
C One-hot encoding
D Min-max scaling
A Ordinal encoding
- Use ordinal encoding if there is some order to the categorical features.
3. What are polynomial features?
A They are higher order relationships in the data.
B They are lower order relationships in the data.
C They are logistic regression coefficients.
D They are represented by linear relationships in the data.
A They are higher order relationships in the data.
- Polynomial features are estimated by higher order polynomials in a linear model, like squared, cubed, etc.
4. What does Boxcox transformation do?
A It makes the data more right skewed.
B It transforms the data distribution into more symmetrical bell curve
C It transforms categorical variables into numerical variables.
D It makes the data more left skewed
B It transforms the data distribution into more symmetrical bell curve
- Boxcox is one of the ways we can transform our skewed dataset to be more normally distributed.
5. Select three important reasons why EDA is useful.
A To determine if the data makes sense, to determine whether further data cleaning is needed, and to help identify patterns and trends in the data
B To analyze data sets, to determine the main characteristics of data sets, and to use sampling to examine data
C To utilize summary statistics, to create visualizations, and to identify outliers
D To examine correlations, to sample from dataframes, and to train models on random samples of data
A To determine if the data makes sense, to determine whether further data cleaning is needed, and to help identify patterns and trends in the data
- EDA helps us analyze data to summarize its main characteristics.
6. What assumption does the linear regression model make about data?
A This model assumes a transformation of each parameter to a linear relationship.
B This model assumes that raw data in data sets is term-38on the same scale.
C This model assumes an addition of each one of the model parameters multiplied by a coefficient.
D This model assumes a linear relationship between predictor variables and outcome variables.
D This model assumes a linear relationship between predictor variables and outcome variables.
- The linear regression model assumes a linear relationship between predictor and outcome variables.
7. What is skewed data?
A Raw data that may not have a linear relationship.
B Data that is distorted away from normal distribution; may be positively or negatively skewed.
C Data that has a normal distribution.
D Raw data that has undergone log transformation.
B Data that is distorted away from normal distribution; may be positively or negatively skewed.
- Often raw data, both the features and the outcome variable, can be negatively or positively skewed.
8. Select the two primary types of categorical feature encoding.
A Log and polynomial transformation
B One-hot encoding and ordinal encoding
C Encoding and scaling
D Nominal encoding and ordinal encoding
D Nominal encoding and ordinal encoding
- Encoding that transforms non-numeric values to numeric values is often applied to categorical features.
9. Which scaling approach puts values between zero and one?
A Standard scaling
B Nearest neighbor scaling
C Robust scaling
D Min-max scaling
D Min-max scaling
- Min-max scaling converts variables to continuous variables in the (0, 1) interval by mapping minimum values to 0 and maximum values to 1.
10. Which variable transformation should you use for nominal data with multiple different values within the feature?
A Ordinal encoding
B One-hot encoding
C Min-max scaling
D Standard scaling
B One-hot encoding
- Use one-hot encoding if there are multiple different values within a feature.
1. (True/False) In general, the population parameters are unknown.
A True.
B False.
A True.
2. (True/False) Parametric models have finite number of parameters.
A True.
B False.
A True.
3. The most common way of estimating parameters in a parametric model is:
A using the maximum likelihood estimation
B using the central limit theorem
C extrapolating a non-parametric model
D extrapolating Bayesian statistics
A using the maximum likelihood estimation
1. A p-value is:
A the smallest significance level at which the null hypothesis would be rejected
B the probability of the null hypothesis being true
C the probability of the null hypothesis being false
D the smallest significance level at which the null hypothesis is accepted
A the smallest significance level at which the null hypothesis would be rejected
2. Type 1 Error 1 is defined as:
A Saying the null hypothesis is false, when it is actually true
B Saying the null hypothesis is true, when it is actually false
A Saying the null hypothesis is false, when it is actually true
3. You find through a graph that there is a strong correlation between Net Promoter Score and the visual time that customers spend on a website. Select the TRUE assertion:
A There is an underlying factor that explains this correlation, but manipulating the time that customers spend on a website may not affect the Net Promoter Score they will give to the company
B To boost the Net Promoter Score of a business, you need to increase the time that customers spend on a website.
A There is an underlying factor that explains this correlation, but manipulating the time that customers spend on a website may not affect the Net Promoter Score they will give to the company
1. Which one of the following is common to both machine learning and statistical inference?
A Using population data to make inferences about a null sample.
B Using population data to model a null hypothesis.
C Using sample data to infer qualities of the underlying population distribution.
D Using sample data to make inferences about a hypothesis.
C Using sample data to infer qualities of the underlying population distribution.
- In both machine learning and statistical inference, we're using sample data to infer qualities of the underlying population distribution.
2. Which one ofthe following describes an approach to customer churn prediction stated in terms of probability?
A Churn prediction is a data-generating process representing the actual joint distribution between our x and the y variable.
B Predicting a score for individuals that estimates the probability the customer will stay.
C Predicting a score for individuals that estimates the probability the customer will leave.
D Data related to churn may include the target variable for whether a certain customer has left.
C Predicting a score for individuals that estimates the probability the customer will leave.
- Churn prediction is often approached by predicting a score for individuals that estimates the probability the customer will leave.
3. What is customer lifetime value?
A The total purchases overthe time which the person is a customer.
B The total churn generated by a customer over their lifetime.
C The total churn a customer generates in the population.
D The total value that the customer receives during their life.
A The total purchases overthe time which the person is a customer.
- Customer lifetime value consists of the purchase amounts over the entire time that a person has been a customer.
4. Which one the following statements about the normalized histogram of a variable is true?
A It provides an estimate of the variable's probability distribution.
B It serves as a bar chart for the null hypothesis.
C It is a parametric representation of the population distribution.
D It is a non-parametric representation of the population variance.
A It provides an estimate of the variable's probability distribution.
- The normalized histogram of a variable estimates the variable's probability distribution, and the estimate improves with the amount of data used.
5. The outcome of rolling a fair die can be modelled as a_____________distribution.
A normal
B Poisson
C log-normal
D uniform
D uniform
- The chance of rolling any particular value for a fair die is equally likely, so the outcome is uniformly distributed.
6. Which one of the following features best distinguishes the Bayesian approach to statistics from the Frequentist approach?
A Frequentist statistics requires construction of a prior distribution.
B Frequentist statistics incorporates the probability of the hypothesis being true.
C Bayesian statistics incorporate the probability of the hypothesis being true.
D Bayesian statistics is better than Frequentist.
C Bayesian statistics incorporate the probability of the hypothesis being true.
- Bayesian statistics allows for experimenters to incorporate their prior beliefs of the [population] distribution [of a given variable]. For frequentists, it's solely based on the data available. [that is, there is no formal mechanism in frequentist statistics for incorporating prior knowledge, one 'lets the data do the talking']
7. Which of the following best describes what a hypothesis is?
A A hypothesis is a statement about a sample of the population.
B A hypothesis is a statement about a prior distribution.
C A hypothesis is a statement about a population.
D A hypothesis is a statement about a posterior distribution.
C A hypothesis is a statement about a population.
- A hypothesis could be suggested by a sample of the population, but it is a statement about the entire population.
8. A Type 2 error in hypothesis testing is____________________:
A correctly rejecting the alternative hypothesis.
B incorrectly accepting the alternative hypothesis.
C correctly rejecting the null hypothesis.
D incorrectly accepting the null hypothesis.
D incorrectly accepting the null hypothesis.
- A type 2 error is incorrectly accepting the null hypothesis.
9. Which statement best describes a consequence of a type II error in the context of a churn prediction example? Assume that the null hypothesis is that customer churn is due to chance, and that the alternative hypothesis is that customers enrolled for greater than two years will not churn over the next year.
A You incorrectly conclude that there is no effect
B You correctly conclude that customer churn is by chance
C You incorrectly conclude that customer churn is by chance
D You correctly conclude that a customer will eventually churn
C You incorrectly conclude that customer churn is by chance
- A type II error means that you incorrectly accept the null hypothesis, so you incorrectly conclude that customer churn is by chance.
10. Which of the following is a statistic used for hypothesis testing?
A The standard deviation.
B The rejection region.
C The acceptance region.
D The likelihood ratio.
D The likelihood ratio.
- The likelihood ratio can be used as a test statistic, to decide whether to accept or reject the null hypothesis.