ADY201m

0.0(0)
studied byStudied by 106 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/386

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

387 Terms

1
New cards
How is Walmart reported to have addressed its analytical needs?
A: Social media
B: Code sharing
C: Outsourcing
D: Crowdsourcing
E: None of the options is correct
Crowdsourcing
2
New cards
What is the average base salary of a data scientist reported by the New York Times?
A: $100,000
B: $150,000
C: $112,000
D: $16 per hour
E: $85,000 + Bonus
$112,000
3
New cards
According to professor Haider, the three important qualities to possess in order to succeed as a data scientist are:
A: Judgemental.
B: Curious.
C: Good Story Teller (Argumentative).
D: Proficient in Programming.
E: Good at Math and Statistics.
Judgemental.
Curious.
Good Story Teller (Argumentative).
4
New cards
According to the reading, how does the author define data science?
A: Data science is the art of uncovering the hidden secrets in data.
B: Data science is a way of understanding things and understanding the world.
C: Data science is a physical science like physics or chemistry
D: Data science is what data scientists do.
E: Data science is some data and more science.
Data science is what data scientists do.
5
New cards
What is admirable about Dr. Patil's definition of a data scientist is that it limits data science to activities involving machine learning.
A: False.
B: True.
False.
6
New cards
According to the reading, what characteristics are said to be exhibited by the best data scientists?
A: Really curious people who ask good questions and have at least 10 years of experience.
B: Curious individuals who ask good questions and are O.K. dealing with unstructured situations.
C: Really curious engineers and statisticians.
D: Really curious people who ask good questions.
E: Thinkers who are really curious and hold a Ph.D.
Curious individuals who ask good questions and are O.K. dealing with unstructured situations.
7
New cards
What is an example of a data reduction algorithm?
A: Prior Variable Analysis.
B: Cojoint Analysis.
C: A/B Testing.
D: Principal Component Analysis.
Principal Component Analysis.
8
New cards
After the data are appropriately processed, transformed, and stored, what is a good starting point for data mining?
A: Machine learning.
B: Data Visualization.
C: Non-parametric methods.
D: Creating a relational database.
Data Visualization.
9
New cards
"Formal evaluation could include testing the predictive capabilities of the models on observed data to see how effective and efficient the algorithms have been in reproducing data." This is known as:
A: In-sample forecast.
B: Prototyping.
C: Reverse engineering.
D: Overfitting.
In-sample forecast.
10
New cards
The Untied States Economic Forecast is a publication by McKinsey University Press.
A: True.
B: False.
False.
11
New cards
The report discussed in the reading successfully did the job of using data and analytics to generate the likely economic scenarios.
A: True.
B: False.
True.
12
New cards
According to the reading, in order to produce a compelling narrative, initial planning and conceptualizing of the final deliverable is of extreme importance.
A: False.
B: True.
True.
13
New cards
The results section is where you present:
A: The empirical findings.
B: R Squared.
C: The conclusion.
D: The methods used.
The empirical findings.
14
New cards
The discussion section is where you:
A: Highlight how your findings provide the ultimate missing piece to the puzzle.
B: Introduce the research methods and data sources used for the analysis.
C: Rely on the power of narrative to enable numbers to communicate your important findings to the readers.
D: Refer the reader to the research question and the knowledge gaps you identified earlier.
Highlight how your findings provide the ultimate missing piece to the puzzle.
Rely on the power of narrative to enable numbers to communicate your important findings to the readers.
Refer the reader to the research question and the knowledge gaps you identified earlier.
15
New cards
According to the reading, what is an example of housekeeping?
A: Adding a list of references.
B: Adding slide numbers.
C: Adding headings to charts.
D: Saving the report as a PDF file.
Adding a list of references.
16
New cards
According to the Module 1 reading, "The Sexiest Job in the 21st Century", a report by the McKinsey Global Institute, by 2018, it is projected that there will be a shortage of people with deep analytical skills in the United States. What is the size of this shortage?
A: 140,000 - 190,000 people
B: 800,000 - 900,000 people
C: 120,000 people
D: 3 - 6 million people
E: 20,000 - 50,000 people
140,000 - 190,000 people
17
New cards
According to the Module 1 reading, "The Sexiest Job in the 21st Century", data Science was called the sexiest job of what century by Harvard Business Review?
A: 19th century
B: 20th century
C: 21st century
D: 22nd century
21st century
18
New cards
According to the Module 1 reading "What Makes Someone a Data Scientist", Hal Varian, the chief economist at Google, declared that "the \____ job in the next ten years will be statisticians"?
A: Richest
B: Easiest
C: Sexy
D: Worst
Sexy
19
New cards
According to the Module 1 reading "What Makes Someone a Data Scientist", the author defines a \_______ as someone who finds solutions to problems by analyzing data using appropriate tools and then tells stories to communicate their findings to the relevant stakeholders.
A: Data scientist
B: Statistician
C: Data analyst
D: Data Engineer
Data scientist
20
New cards
According to the Module 2 reading "Data Mining", the output of a data mining exercise largely depends on the quality of what?
A: The material
B: The data
C: The data scientist
D: The project
The data
21
New cards
According to the Module 2 reading, "Data Mining", when data are missing in a systematic way, you can simply extrapolate the data or impute the missing data by filling in the average of the values around the missing data.
A: False.
B: True.
False.
22
New cards
Based on the Module 2 reading, "Regression", the real added value of the author's research on what type of properties is quantifying the magnitude of relationships between housing prices and different determinants?
A: Residential real estate
B: Foreclosed
C: Commercial real estate
D: Vacant
Residential real estate
23
New cards
Based on the Module 2 reading, "Regression", the author's research revealed that adding what additional room had a bigger impact than adding a bedroom?
A: Theater room
B: Study
C: Playroom
D: Washroom
Washroom
24
New cards
According to the Module 3 reading, "The Final Deliverable", the ultimate purpose of analytics is to communicate findings to what people to formulate policy or strategy?
A: Marketing
B: Stakeholders
C: CEO's
D: Salespeople
Stakeholders
25
New cards
Based on the Module 3 reading, "The Final Deliverable", what is the role of a data scientist??
A: Managing a team of analysts to create a predictive model.
B: Using the data to put together a story that boosts financial outlooks.
C: Using insights to build a narrative to communicate findings.
D: Developing a strategy to fix the problems in the findings.
Using insights to build a narrative to communicate findings.
26
New cards
Based on the Module 3 reading, "The Report Structure", regardless of the length of the final deliverable, the author recommends that it includes a cover page, table of contents, executive summary, a methodology section, and a what?
A: Project scope statement
B: Discussion section
C: List of people who worked on the project
D: Copy of your data
Discussion section
27
New cards
Based on the Module 3 reading, "The Report Structure", an introductory section is always helpful in setting up the problem for the reader who might be what?
A: In sales
B: Looking for the statistical calculations
C: Wanting to know the research methods
D: New to the topic
New to the topic
28
New cards
Which of the following statements is true?
A: Python is the most popular language in data science.
B: 80% of data scientists worldwide use Python.
C: Python is useful for AI, machine learning, web development, and IoT.

D: Keras, Scikit-learn, Matplotlib, Pandas, and TensorFlow are all Python libraries
E: All of the above
All of the above
29
New cards
Which of the following are SQL databases? (Select all that apply.)
A: PostgreSQL
B: CouchDB
C: MongoDB
D: MySQL
E: MariaDB
F: Oracle
PostgreSQL
MySQL
MariaDB
Oracle
30
New cards
Which statement is not true about Open Source and Free Software?

A: Free Software and Open Source can be used interchangeably.
B: Free Software can always be run, studied, modified and redistributed with or without changes.
C: Most of Free Software licenses also qualify for Open Source.
D: Open Source Software can be modified without sharing the modified source code depending on the Open Source license.
Free Software and Open Source can be used interchangeably.
31
New cards
Is the following statement true or false: R integrates well with other computer languages like C++, Java, C, .Net and Python.
A: True
B: False
True
32
New cards
Which of the following languages can be used for data science?
A: R
B: Julia
C: Java
D: Javascript
E: Scala
F: SQL
G: All of the above
All of the above
33
New cards
Which of the following is not used to make Artificial Intelligence and Machine Learning possible?
A: Oracle
B: PyTorch
C: TensorFlow.js
D: Apache Spark
E: Caffe
Oracle
34
New cards
Which of the following are common tasks in Data Science? (Select all that apply)
A: Data Management
B: Data Integration and Transformation
C: Data Visualization
D: Model Building
E: Model Deployment
F: Model Monitoring and Assessment
Data Management
Data Integration and Transformation
Data Visualization
Model Building
Model Deployment
Model Monitoring and Assessment
35
New cards
Which of the following are data management tools? (Select all that apply.)
A: GitHub
B: MySQL
C: PostgreSQL
D: KubeFlow
E: PixieDust
MySQL
PostgreSQL
36
New cards
Which of the following are Data Integration and Transformation tools? (Select all that apply.)
A: Cassandra
B: Apache Kafka
C: Apache Nifi
D: Apache AirFlow
E: Ceph
Apache Kafka
Apache Nifi
Apache AirFlow
37
New cards
Which statement about JupyterLab is correct?
A: JuypterLab can run R code only.
B: JuypterLab can run R and Python code only.
C: JuypterLab can run R and Python code in addition to other programming languages.
D: JuypterLab can run Python code only.
JuypterLab can run R and Python code in addition to other programming languages.
38
New cards
Which statement about RStudio is correct?
A: RStudio is the primary choice for development in the R programming language.
B: RStudio is the primary choice for web development.
C: RStudio is the primary choice for development in the Python programming language.
RStudio is the primary choice for development in the R programming language.
39
New cards
Which statements about IBM Watson Studio and OpenScale are correct? (Select all that apply.)
A: Watson Studio together with Watson OpenScale is a database management system.
B: Watson Studio together with Watson OpenScale covers the complete development life cycle for all data science, machine learning and AI tasks.
C: Watson Studio together with Watson OpenScale is available as a Cloud offering as well as a package running on top of Kubernetes/RedHat OpenShift in a local data center called IBM Cloud Pak for Data.
Watson Studio together with Watson OpenScale covers the complete development life cycle for all data science, machine learning and AI tasks.
Watson Studio together with Watson OpenScale is available as a Cloud offering as well as a package running on top of Kubernetes/RedHat OpenShift in a local data center called IBM Cloud Pak for Data.
40
New cards
Which scientific computing library provides data structures and data analysis tools for Python?
A: TensorFlow
B: Pandas
C: YumPies
D: Seahorse
Pandas
41
New cards
What does the acronym API stand for?
A: Abstract Python Interface
B: Application Programming Interface
C: Algorithmic Programming Interface
D: Abstract Programming Interface
Application Programming Interface
42
New cards
True or False: Open data is always distributed under a Community Data License Agreement.
A: True
B: False
False
43
New cards
Which of the following is not a type of Machine Learning?
A: Unsupervised learning
B: Supervised learning
C: Reinforcement learning
D: Supervised teaching
Supervised teaching
44
New cards
Which of the following is NOT a deep learning framework?
A: Keras
B: TensorFlow
C: PyTorch
D: Tommy
Tommy
45
New cards
Fill in the blank: The MAX model-serving microservices expose a \_________________ that applications use to consume a model.
A: Java API
B: REST API
C: Python API
D: Scala API
REST API
46
New cards
Which are the three most used languages for data science? (Select all that apply.)
A: Python
B: Java
C: Scala
D: SQL
E: R
Python
SQL
R
47
New cards
Is it possible to use machine learning within a web browser with Javascript?
A: Yes
B: No
Yes
48
New cards
Comma Separated Values (CSV) is a commonly used format to store:
A: Hierarchical or network data
B: Tabular data
C: All of the above
Tabular data
49
New cards
Classification models can be used to determine whether:
A: An email is likely spam.
B: A video contains a specific sound.
C: An image contains a dog.
D: All of the above.
All of the above.
50
New cards
Fill in the blank: \________________ is the heart of every organization.
A: Integration
B: The cloud
C: Data
D: Open source
Data
51
New cards
What does the "BI" in BI Tools stand for?
A: Business Integration
B: Build Integration
C: Build Information
D: Business Intelligence
Business Intelligence
52
New cards
Which of the following functions do Jupyter Notebook unify?
A: Editing and execution of source code.
B: Editing and display of documentation.
C: Visualization of charts.
D: All of the above.
All of the above.
53
New cards
Which statement is true about Jupyter Notebook?
A: Jupyter Notebook is a commercial product of IBM.
B: Jupyter Notebook is free and open source.
Jupyter Notebook is free and open source.
54
New cards
What is a Jupyter Notebook kernel?
A: It is a wrapper running on the Jupyter server encapsulating the programming language interpreter.
B: It is part of the operating system the Jupyter server runs on.
It is a wrapper running on the Jupyter server encapsulating the programming language interpreter.
55
New cards
Which of the following functions does RStudio unify? (Select all that apply.)
A: Storing of data.
B: Editing and execution of source code.
C: Display of the R Console.
D: Visualization of plots.
E: Visualization of data in table form.
Editing and execution of source code.
Display of the R Console.
Visualization of plots.
Visualization of data in table form.
56
New cards
Which statement is true about the RStudio IDE?
A: RStudio is free and open source.
B: RStudio is a commercial product of IBM.
RStudio is free and open source.
57
New cards
Which statement about R packages is correct?
A: R currently supports more than 15,000 packages which can be installed to extend R's functionality.
B: R doesn't require any packages to be installed since it contains all functionality necessary which a data scientists ever requires.
R currently supports more than 15,000 packages which can be installed to extend R's functionality.
58
New cards
What tool do most Python developers use?
A: RStudio
B: Jupyter Notebooks / JupyterLab
Jupyter Notebooks / JupyterLab
59
New cards
True or false? Jupyter Notebooks / JupyterLab support development in R.
A: True
B: False
True
60
New cards
Which tool unifies documentation, source code and data visualizations into a single document?
A: Jupyter Notebooks / JupyterLab
B: Notepad
C: VSCode
Jupyter Notebooks / JupyterLab
61
New cards
Which command is used to install packages in R?
A: install("package name")
B: package("package name")
C: install.packages("package name")
D: install.package("package name")
install.packages("package name")
62
New cards
True or False: The Jupyter Notebook kernel must be installed on a local server.
A: True
B: False
False
63
New cards
True or false? RStudio supports development in Python.
A: True
B: False
True
64
New cards
Fill in the blank: In Watson Studio, a \____________ is how you organize your resources to achieve a particular goal. Resources can include data, collaborators, and analytic assets like notebooks and models.
A: Project
B: Job
C: Asset
D: Notebook
Project
65
New cards
Fill in the blank: It's a best practice to remove or replace \_____________ before publishing to GitHub.
A: Markdown text
B: Charts
C: Code cells
D: Credentials
Credentials
66
New cards
Which of the following do you need to create in order to publish a notebook to your GitHub repository?
A: Apps
B: Profile
C: Access token
D: Login credential
Access token
67
New cards
Fill in the blank: If you'd like to schedule a notebook in Watson Studio to run at a different time you can create a(n) \_____________.
A: Asset
B: Job
C: API
D: Markdown cell
Job
68
New cards
Fill in the blank: On the environments tab you can define the \_________________.
A: Hardware size.
B: Software configuration.
C: Runtime configuration for notebook editor.
D: Runtime configuration for flow editor.
E: All of the above.
All of the above.
69
New cards
Fill in the blank: When sharing a read only version of a notebook, you can choose to share \__________________.
A: Only text and output.
B: All content, excluding sensitive code cells.
C: All content including code.
D: A permalink.
E: All of the above.
All of the above.
70
New cards
Fill in the blank: When working in a Jupyter Notebook, before returning to a project, it's important to \________________________.
A: Insert to code.
B: Insert cells.
C: Run cells.
D: Save your notebook.
Save your notebook.
71
New cards
Fill in the blank: Before running a notebook, it's a best practice to \____________ to describe what the notebook does.
A: Refresh your page.
B: Insert a cell at the top of the notebook.
C: Delete notebook cells.
D: Insert a cell at the bottom of the notebook.
Insert a cell at the top of the notebook.
72
New cards
Fill in the blank: In the \_____________ tab you can define the hardware size and software configuration for the runtime associated with Watson Studio tools such as notebooks.
A: Assets
B: Environments
C: Overview
D: Settings
Environments
73
New cards
Fill in the blank: IBM Cloud uses \______________ as a way for you to organize your account resources in customizable groupings so that you can quickly assign users access to more than one resource at a time.
A: Resource groups
B: Services
C: Projects
D: Catalogs
Resource groups
74
New cards
Which products (of those we covered) allow you to build data pipelines using graphical user interface and no coding?
A: Only IBM SPSS Statistics.
B: Only IBM SPSS Modeler.
C: OpenScale
D: IBM SPSS Modeler and Modeler Flows in Watson Studio.
E: All of the above.
IBM SPSS Modeler and Modeler Flows in Watson Studio.
75
New cards
Which features of Data Refinery help save hours and days of data preparation?
A: Flexibility of using Intuitive user interface and coding templates enabled with powerful operations to shape and clean data.
B: Data visualization and profiles to spot the difference and guide data preparation steps.
C: Incremental snapshots of the results allowing the user to gauge success with each iterative change.
D: Saving, editing and fixing the steps provide ability to iteratively fix the steps in the flow.
E: All of the above.
All of the above.
76
New cards
Watson Knowledge Catalog provides what functionality?
A: Catalog data and ML assets, help to find relevant assets, keep track of asset lineage, enforce data governance.
B: Build data and water pipelines.
C: Catalog all books mentioning Doctor Watson and Sherlock Holmes.
D: Process data, build and deploy models.
E: Create data and deploy models into production.
Catalog data and ML assets, help to find relevant assets, keep track of asset lineage, enforce data governance.
77
New cards
Fill in the blank: PMML, PFA, and ONNX are \__________________.
A: Open standards for predictive model serialization, exchange, and deployment.
B: Abbreviations for machine learning algorithm names.
C: Passwords for some super-secret system.
D: Robots that are plotting to take over the planet.
E: Codes for getting rid of undesired data or models.
Open standards for predictive model serialization, exchange, and deployment.
78
New cards
Which node must be used in Modeler flows before any modeling node?
A: Output node
B: Type node
C: Derive node
D: Auto Numeric node
E: All of the above
Type node
79
New cards
Fill in the blank: Auto Classification node can be used for data with \______________.
A: No target variables.
B: A categorical target variable.
C: A continuous target variable.
D: Any target variable.
E: All of the above
A categorical target variable.
80
New cards
Fill in the blank: Auto Numeric node can be used for data with \__________________.
A: No target variables.
B: A categorical target variable.
C: A continuous target variable.
D: Any target variable.
E: All of the above.
A continuous target variable.
81
New cards
IBM SPSS Modeler evolved from which product?
A: SPSS
B: IBM DB2
C: Netezza
D: Oracle
E: Clementine
Clementine
82
New cards
Fill in the blank: IBM SPSS Statistics syntax can be created using \___________.
A: IBM SPSS Modeler streams.
B: Watson Studio Modeler flows.
C: Graphical user interface of IBM SPSS Statistics product or syntax editor.
D: OpenScale
E: AutoAI
Graphical user interface of IBM SPSS Statistics product or syntax editor.
83
New cards
AutoAI provides which of the following services?
A: Monitoring for fairness, bias, and model drift.
B: Automatic finding of optimal data preparation steps, model selection, and hyperparameter optimization.

C: Cataloging data and model assets.
D: Creating SPSS syntax.
E: All of the above.
Automatic finding of optimal data preparation steps, model selection, and hyperparameter optimization.
84
New cards
OpenScale provides which of the following services?
A: Creating SPSS syntax.
B: Automatic finding of optimal data preparation steps, model selection, and hyperparameter optimization.
C: Cataloging data and model assets.
D: Monitoring for fairness, bias, and model drift.
E: All of the above.
Monitoring for fairness, bias, and model drift.
85
New cards
Predictive Model Markup Language (PMML) was created by which entity?
A: Microsoft
B: The Data Mining Group
C: Oracle
D: IBM
E: SPSS
The Data Mining Group
86
New cards
Data Refinery provides which of the following services?
A: Catalog the data assets.
B: Monitor for bias and model drift.
C: Visualize and prepare data.
D: Automatically build models.
E: All of the above.
Visualize and prepare data.
87
New cards
IBM SPSS Modeler includes what kind of models?
A: Classification models (for data with a categorical target).
B: Regression models (for data with a continuous target).
C: Clustering models (for data with no target variables).
D: Other kinds of models.
E: All of the above.
All of the above.
88
New cards
Open Neural Network eXchange (ONNX) was originally created for what models?
A: Clustering models
B: Decision trees
C: Support Vector Machines (SVM).
D: Deep learning models.
E: Regression models
Deep learning models.
89
New cards
Fill in the blank: If you'd like to schedule a notebook in Watson Studio to run at a different time, you can create a(n) \________.
A: API
B: asset
C: markdown cell
D: job
job
90
New cards
Fill in the blank: In the \__________ tab you can define the hardware size and software configuration for the runtime associated with Watson Studio tools such as Notebook.
A: assets
B: environments
C: overview
D: settings
environments
91
New cards
Fill in the blank: It's a best practice to remove or replace \_____________ before publishing to GitHub.
A: charts
B: code cells
C: credentials
D: markdown text
credentials
92
New cards
What does SQL stand for?
A: Strong Query Language
B: Structured Quick Language
C: Structured Query Language
D: Structured Quadrant Language
Structured Query Language
93
New cards
Which of these is a machine learning or deep learning library for Python?
A: Requests
B: Scikit-learn
C: Pandas
D: NumPy
Scikit-learn
94
New cards
What problem(s) are targeted by supervised machine learning?
A: Clustering problem
B: Regression problem
C: Classification problem
D: Regression and Classification problems
Regression and Classification problems
95
New cards
What is data governance?
A: Deleting data stored within a central repository
B: Creating processes and controls around the access of data
C: Storing data in a central repository
D: An implementation of data processes and controls
Creating processes and controls around the access of data
96
New cards
Which are the two most used open source tools for data science?
A: Notepad
B: RStudio
C: Jupyter Notebooks / JupyterLab
D: Spyder
E: VSCode
RStudio
Jupyter Notebooks / JupyterLab
97
New cards
How is the R programming language different than Python?
A: It was built by statisticians and their specific language
B: It's primary objective is deployment and production
C: It is a general purpose language
It was built by statisticians and their specific language
98
New cards
What type of environment is RStudio?
A: Software Development Kit (SDK)
B: API
C: Integrated Development Environment (IDE)
D: Framework
Integrated Development Environment (IDE)
99
New cards
What does the acronym "Jupyter" mean in Jupyter Notebooks?
A: Julia, Python and Ruby
B: Javascript, Python and R
C: Julia, PHP and R
D: Julia, Python and R
Julia, Python and R
100
New cards
Which feature in Watson Studio helps to keep track of and discover relevant Machine Learning assets?
A: Watson Knowledge Catalog
B: All of the above
C: AutoAI
D: Modeler Flows
E: OpenScale
Watson Knowledge Catalog