ADY201m

0.0(0)

Studied by 106 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/386

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

387 Terms

New cards

How is Walmart reported to have addressed its analytical needs?
A: Social media
B: Code sharing
C: Outsourcing
D: Crowdsourcing
E: None of the options is correct

Crowdsourcing

New cards

What is the average base salary of a data scientist reported by the New York Times?
A: $100,000
B: $150,000
C: $112,000
D: $16 per hour
E: $85,000 + Bonus

$112,000

New cards

According to professor Haider, the three important qualities to possess in order to succeed as a data scientist are:
A: Judgemental.
B: Curious.
C: Good Story Teller (Argumentative).
D: Proficient in Programming.
E: Good at Math and Statistics.

Judgemental.
Curious.
Good Story Teller (Argumentative).

New cards

According to the reading, how does the author define data science?
A: Data science is the art of uncovering the hidden secrets in data.
B: Data science is a way of understanding things and understanding the world.
C: Data science is a physical science like physics or chemistry
D: Data science is what data scientists do.
E: Data science is some data and more science.

Data science is what data scientists do.

New cards

What is admirable about Dr. Patil's definition of a data scientist is that it limits data science to activities involving machine learning.
A: False.
B: True.

False.

New cards

According to the reading, what characteristics are said to be exhibited by the best data scientists?
A: Really curious people who ask good questions and have at least 10 years of experience.
B: Curious individuals who ask good questions and are O.K. dealing with unstructured situations.
C: Really curious engineers and statisticians.
D: Really curious people who ask good questions.
E: Thinkers who are really curious and hold a Ph.D.

Curious individuals who ask good questions and are O.K. dealing with unstructured situations.

New cards

What is an example of a data reduction algorithm?
A: Prior Variable Analysis.
B: Cojoint Analysis.
C: A/B Testing.
D: Principal Component Analysis.

Principal Component Analysis.

New cards

After the data are appropriately processed, transformed, and stored, what is a good starting point for data mining?
A: Machine learning.
B: Data Visualization.
C: Non-parametric methods.
D: Creating a relational database.

Data Visualization.

New cards

"Formal evaluation could include testing the predictive capabilities of the models on observed data to see how effective and efficient the algorithms have been in reproducing data." This is known as:
A: In-sample forecast.
B: Prototyping.
C: Reverse engineering.
D: Overfitting.

In-sample forecast.

New cards

The Untied States Economic Forecast is a publication by McKinsey University Press.
A: True.
B: False.

False.

New cards

The report discussed in the reading successfully did the job of using data and analytics to generate the likely economic scenarios.
A: True.
B: False.

True.

New cards

According to the reading, in order to produce a compelling narrative, initial planning and conceptualizing of the final deliverable is of extreme importance.
A: False.
B: True.

True.

New cards

The results section is where you present:
A: The empirical findings.
B: R Squared.
C: The conclusion.
D: The methods used.

The empirical findings.

New cards

The discussion section is where you:
A: Highlight how your findings provide the ultimate missing piece to the puzzle.
B: Introduce the research methods and data sources used for the analysis.
C: Rely on the power of narrative to enable numbers to communicate your important findings to the readers.
D: Refer the reader to the research question and the knowledge gaps you identified earlier.

Highlight how your findings provide the ultimate missing piece to the puzzle.
Rely on the power of narrative to enable numbers to communicate your important findings to the readers.
Refer the reader to the research question and the knowledge gaps you identified earlier.

New cards

According to the reading, what is an example of housekeeping?
A: Adding a list of references.
B: Adding slide numbers.
C: Adding headings to charts.
D: Saving the report as a PDF file.

Adding a list of references.

New cards

According to the Module 1 reading, "The Sexiest Job in the 21st Century", a report by the McKinsey Global Institute, by 2018, it is projected that there will be a shortage of people with deep analytical skills in the United States. What is the size of this shortage?
A: 140,000 - 190,000 people
B: 800,000 - 900,000 people
C: 120,000 people
D: 3 - 6 million people
E: 20,000 - 50,000 people

140,000 - 190,000 people

New cards

According to the Module 1 reading, "The Sexiest Job in the 21st Century", data Science was called the sexiest job of what century by Harvard Business Review?
A: 19th century
B: 20th century
C: 21st century
D: 22nd century

21st century

New cards

According to the Module 1 reading "What Makes Someone a Data Scientist", Hal Varian, the chief economist at Google, declared that "the \____ job in the next ten years will be statisticians"?
A: Richest
B: Easiest
C: Sexy
D: Worst

Sexy

New cards

According to the Module 1 reading "What Makes Someone a Data Scientist", the author defines a \_______ as someone who finds solutions to problems by analyzing data using appropriate tools and then tells stories to communicate their findings to the relevant stakeholders.
A: Data scientist
B: Statistician
C: Data analyst
D: Data Engineer

Data scientist

New cards

According to the Module 2 reading "Data Mining", the output of a data mining exercise largely depends on the quality of what?
A: The material
B: The data
C: The data scientist
D: The project

The data

New cards

According to the Module 2 reading, "Data Mining", when data are missing in a systematic way, you can simply extrapolate the data or impute the missing data by filling in the average of the values around the missing data.
A: False.
B: True.

False.

New cards

Based on the Module 2 reading, "Regression", the real added value of the author's research on what type of properties is quantifying the magnitude of relationships between housing prices and different determinants?
A: Residential real estate
B: Foreclosed
C: Commercial real estate
D: Vacant

Residential real estate

New cards

Based on the Module 2 reading, "Regression", the author's research revealed that adding what additional room had a bigger impact than adding a bedroom?
A: Theater room
B: Study
C: Playroom
D: Washroom

Washroom

New cards

According to the Module 3 reading, "The Final Deliverable", the ultimate purpose of analytics is to communicate findings to what people to formulate policy or strategy?
A: Marketing
B: Stakeholders
C: CEO's
D: Salespeople

Stakeholders

New cards

Based on the Module 3 reading, "The Final Deliverable", what is the role of a data scientist??
A: Managing a team of analysts to create a predictive model.
B: Using the data to put together a story that boosts financial outlooks.
C: Using insights to build a narrative to communicate findings.
D: Developing a strategy to fix the problems in the findings.

Using insights to build a narrative to communicate findings.

New cards

Based on the Module 3 reading, "The Report Structure", regardless of the length of the final deliverable, the author recommends that it includes a cover page, table of contents, executive summary, a methodology section, and a what?
A: Project scope statement
B: Discussion section
C: List of people who worked on the project
D: Copy of your data

Discussion section

New cards

Based on the Module 3 reading, "The Report Structure", an introductory section is always helpful in setting up the problem for the reader who might be what?
A: In sales
B: Looking for the statistical calculations
C: Wanting to know the research methods
D: New to the topic

New to the topic

New cards

Which of the following statements is true?
A: Python is the most popular language in data science.
B: 80% of data scientists worldwide use Python.
C: Python is useful for AI, machine learning, web development, and IoT.

D: Keras, Scikit-learn, Matplotlib, Pandas, and TensorFlow are all Python libraries
E: All of the above

All of the above

New cards

Which of the following are SQL databases? (Select all that apply.)
A: PostgreSQL
B: CouchDB
C: MongoDB
D: MySQL
E: MariaDB
F: Oracle

PostgreSQL
MySQL
MariaDB
Oracle

New cards

Which statement is not true about Open Source and Free Software?

A: Free Software and Open Source can be used interchangeably.
B: Free Software can always be run, studied, modified and redistributed with or without changes.
C: Most of Free Software licenses also qualify for Open Source.
D: Open Source Software can be modified without sharing the modified source code depending on the Open Source license.

Free Software and Open Source can be used interchangeably.

New cards

Is the following statement true or false: R integrates well with other computer languages like C++, Java, C, .Net and Python.
A: True
B: False

True

New cards

Which of the following languages can be used for data science?
A: R
B: Julia
C: Java
D: Javascript
E: Scala
F: SQL
G: All of the above

All of the above

New cards

Which of the following is not used to make Artificial Intelligence and Machine Learning possible?
A: Oracle
B: PyTorch
C: TensorFlow.js
D: Apache Spark
E: Caffe

Oracle

New cards

Which of the following are common tasks in Data Science? (Select all that apply)
A: Data Management
B: Data Integration and Transformation
C: Data Visualization
D: Model Building
E: Model Deployment
F: Model Monitoring and Assessment

Data Management
Data Integration and Transformation
Data Visualization
Model Building
Model Deployment
Model Monitoring and Assessment

New cards

Which of the following are data management tools? (Select all that apply.)
A: GitHub
B: MySQL
C: PostgreSQL
D: KubeFlow
E: PixieDust

MySQL
PostgreSQL

New cards

Which of the following are Data Integration and Transformation tools? (Select all that apply.)
A: Cassandra
B: Apache Kafka
C: Apache Nifi
D: Apache AirFlow
E: Ceph

Apache Kafka
Apache Nifi
Apache AirFlow

New cards

Which statement about JupyterLab is correct?
A: JuypterLab can run R code only.
B: JuypterLab can run R and Python code only.
C: JuypterLab can run R and Python code in addition to other programming languages.
D: JuypterLab can run Python code only.

JuypterLab can run R and Python code in addition to other programming languages.

New cards

Which statement about RStudio is correct?
A: RStudio is the primary choice for development in the R programming language.
B: RStudio is the primary choice for web development.
C: RStudio is the primary choice for development in the Python programming language.

RStudio is the primary choice for development in the R programming language.

New cards

Which statements about IBM Watson Studio and OpenScale are correct? (Select all that apply.)
A: Watson Studio together with Watson OpenScale is a database management system.
B: Watson Studio together with Watson OpenScale covers the complete development life cycle for all data science, machine learning and AI tasks.
C: Watson Studio together with Watson OpenScale is available as a Cloud offering as well as a package running on top of Kubernetes/RedHat OpenShift in a local data center called IBM Cloud Pak for Data.

Watson Studio together with Watson OpenScale covers the complete development life cycle for all data science, machine learning and AI tasks.
Watson Studio together with Watson OpenScale is available as a Cloud offering as well as a package running on top of Kubernetes/RedHat OpenShift in a local data center called IBM Cloud Pak for Data.

New cards

Which scientific computing library provides data structures and data analysis tools for Python?
A: TensorFlow
B: Pandas
C: YumPies
D: Seahorse

Pandas

New cards

What does the acronym API stand for?
A: Abstract Python Interface
B: Application Programming Interface
C: Algorithmic Programming Interface
D: Abstract Programming Interface

Application Programming Interface

New cards

True or False: Open data is always distributed under a Community Data License Agreement.
A: True
B: False

False

New cards

Which of the following is not a type of Machine Learning?
A: Unsupervised learning
B: Supervised learning
C: Reinforcement learning
D: Supervised teaching

Supervised teaching

New cards

Which of the following is NOT a deep learning framework?
A: Keras
B: TensorFlow
C: PyTorch
D: Tommy

Tommy

New cards

Fill in the blank: The MAX model-serving microservices expose a \_________________ that applications use to consume a model.
A: Java API
B: REST API
C: Python API
D: Scala API

REST API

New cards

Which are the three most used languages for data science? (Select all that apply.)
A: Python
B: Java
C: Scala
D: SQL
E: R

Python
SQL
R

New cards

Is it possible to use machine learning within a web browser with Javascript?
A: Yes
B: No

Yes

New cards

Comma Separated Values (CSV) is a commonly used format to store:
A: Hierarchical or network data
B: Tabular data
C: All of the above

Tabular data

New cards

Classification models can be used to determine whether:
A: An email is likely spam.
B: A video contains a specific sound.
C: An image contains a dog.
D: All of the above.

All of the above.

New cards

Fill in the blank: \________________ is the heart of every organization.
A: Integration
B: The cloud
C: Data
D: Open source

Data

New cards

What does the "BI" in BI Tools stand for?
A: Business Integration
B: Build Integration
C: Build Information
D: Business Intelligence

Business Intelligence

New cards

Which of the following functions do Jupyter Notebook unify?
A: Editing and execution of source code.
B: Editing and display of documentation.
C: Visualization of charts.
D: All of the above.

All of the above.

New cards

Which statement is true about Jupyter Notebook?
A: Jupyter Notebook is a commercial product of IBM.
B: Jupyter Notebook is free and open source.

Jupyter Notebook is free and open source.

New cards

What is a Jupyter Notebook kernel?
A: It is a wrapper running on the Jupyter server encapsulating the programming language interpreter.
B: It is part of the operating system the Jupyter server runs on.

It is a wrapper running on the Jupyter server encapsulating the programming language interpreter.

New cards

Which of the following functions does RStudio unify? (Select all that apply.)
A: Storing of data.
B: Editing and execution of source code.
C: Display of the R Console.
D: Visualization of plots.
E: Visualization of data in table form.

Editing and execution of source code.
Display of the R Console.
Visualization of plots.
Visualization of data in table form.

New cards

Which statement is true about the RStudio IDE?
A: RStudio is free and open source.
B: RStudio is a commercial product of IBM.

RStudio is free and open source.

New cards

Which statement about R packages is correct?
A: R currently supports more than 15,000 packages which can be installed to extend R's functionality.
B: R doesn't require any packages to be installed since it contains all functionality necessary which a data scientists ever requires.

R currently supports more than 15,000 packages which can be installed to extend R's functionality.

New cards

What tool do most Python developers use?
A: RStudio
B: Jupyter Notebooks / JupyterLab

Jupyter Notebooks / JupyterLab

New cards

True or false? Jupyter Notebooks / JupyterLab support development in R.
A: True
B: False

True

New cards

Which tool unifies documentation, source code and data visualizations into a single document?
A: Jupyter Notebooks / JupyterLab
B: Notepad
C: VSCode

Jupyter Notebooks / JupyterLab

New cards

Which command is used to install packages in R?
A: install("package name")
B: package("package name")
C: install.packages("package name")
D: install.package("package name")

install.packages("package name")

New cards

True or False: The Jupyter Notebook kernel must be installed on a local server.
A: True
B: False

False

New cards

True or false? RStudio supports development in Python.
A: True
B: False

True

New cards

Fill in the blank: In Watson Studio, a \____________ is how you organize your resources to achieve a particular goal. Resources can include data, collaborators, and analytic assets like notebooks and models.
A: Project
B: Job
C: Asset
D: Notebook

Project

New cards

Fill in the blank: It's a best practice to remove or replace \_____________ before publishing to GitHub.
A: Markdown text
B: Charts
C: Code cells
D: Credentials

Credentials

New cards

Which of the following do you need to create in order to publish a notebook to your GitHub repository?
A: Apps
B: Profile
C: Access token
D: Login credential

Access token

New cards

Fill in the blank: If you'd like to schedule a notebook in Watson Studio to run at a different time you can create a(n) \_____________.
A: Asset
B: Job
C: API
D: Markdown cell

Job

New cards

Fill in the blank: On the environments tab you can define the \_________________.
A: Hardware size.
B: Software configuration.
C: Runtime configuration for notebook editor.
D: Runtime configuration for flow editor.
E: All of the above.

All of the above.

New cards

Fill in the blank: When sharing a read only version of a notebook, you can choose to share \__________________.
A: Only text and output.
B: All content, excluding sensitive code cells.
C: All content including code.
D: A permalink.
E: All of the above.

All of the above.

New cards

Fill in the blank: When working in a Jupyter Notebook, before returning to a project, it's important to \________________________.
A: Insert to code.
B: Insert cells.
C: Run cells.
D: Save your notebook.

Save your notebook.

New cards

Fill in the blank: Before running a notebook, it's a best practice to \____________ to describe what the notebook does.
A: Refresh your page.
B: Insert a cell at the top of the notebook.
C: Delete notebook cells.
D: Insert a cell at the bottom of the notebook.

Insert a cell at the top of the notebook.

New cards

Fill in the blank: In the \_____________ tab you can define the hardware size and software configuration for the runtime associated with Watson Studio tools such as notebooks.
A: Assets
B: Environments
C: Overview
D: Settings

Environments

New cards

Fill in the blank: IBM Cloud uses \______________ as a way for you to organize your account resources in customizable groupings so that you can quickly assign users access to more than one resource at a time.
A: Resource groups
B: Services
C: Projects
D: Catalogs

Resource groups

New cards

Which products (of those we covered) allow you to build data pipelines using graphical user interface and no coding?
A: Only IBM SPSS Statistics.
B: Only IBM SPSS Modeler.
C: OpenScale
D: IBM SPSS Modeler and Modeler Flows in Watson Studio.
E: All of the above.

IBM SPSS Modeler and Modeler Flows in Watson Studio.

New cards

Which features of Data Refinery help save hours and days of data preparation?
A: Flexibility of using Intuitive user interface and coding templates enabled with powerful operations to shape and clean data.
B: Data visualization and profiles to spot the difference and guide data preparation steps.
C: Incremental snapshots of the results allowing the user to gauge success with each iterative change.
D: Saving, editing and fixing the steps provide ability to iteratively fix the steps in the flow.
E: All of the above.

All of the above.

New cards

Watson Knowledge Catalog provides what functionality?
A: Catalog data and ML assets, help to find relevant assets, keep track of asset lineage, enforce data governance.
B: Build data and water pipelines.
C: Catalog all books mentioning Doctor Watson and Sherlock Holmes.
D: Process data, build and deploy models.
E: Create data and deploy models into production.

Catalog data and ML assets, help to find relevant assets, keep track of asset lineage, enforce data governance.

New cards

Fill in the blank: PMML, PFA, and ONNX are \__________________.
A: Open standards for predictive model serialization, exchange, and deployment.
B: Abbreviations for machine learning algorithm names.
C: Passwords for some super-secret system.
D: Robots that are plotting to take over the planet.
E: Codes for getting rid of undesired data or models.

Open standards for predictive model serialization, exchange, and deployment.

New cards

Which node must be used in Modeler flows before any modeling node?
A: Output node
B: Type node
C: Derive node
D: Auto Numeric node
E: All of the above

Type node

New cards

Fill in the blank: Auto Classification node can be used for data with \______________.
A: No target variables.
B: A categorical target variable.
C: A continuous target variable.
D: Any target variable.
E: All of the above

A categorical target variable.

New cards

Fill in the blank: Auto Numeric node can be used for data with \__________________.
A: No target variables.
B: A categorical target variable.
C: A continuous target variable.
D: Any target variable.
E: All of the above.

A continuous target variable.

New cards

IBM SPSS Modeler evolved from which product?
A: SPSS
B: IBM DB2
C: Netezza
D: Oracle
E: Clementine

Clementine

New cards

Fill in the blank: IBM SPSS Statistics syntax can be created using \___________.
A: IBM SPSS Modeler streams.
B: Watson Studio Modeler flows.
C: Graphical user interface of IBM SPSS Statistics product or syntax editor.
D: OpenScale
E: AutoAI

Graphical user interface of IBM SPSS Statistics product or syntax editor.

New cards

AutoAI provides which of the following services?
A: Monitoring for fairness, bias, and model drift.
B: Automatic finding of optimal data preparation steps, model selection, and hyperparameter optimization.

C: Cataloging data and model assets.
D: Creating SPSS syntax.
E: All of the above.

Automatic finding of optimal data preparation steps, model selection, and hyperparameter optimization.

New cards

OpenScale provides which of the following services?
A: Creating SPSS syntax.
B: Automatic finding of optimal data preparation steps, model selection, and hyperparameter optimization.
C: Cataloging data and model assets.
D: Monitoring for fairness, bias, and model drift.
E: All of the above.

Monitoring for fairness, bias, and model drift.

New cards

Predictive Model Markup Language (PMML) was created by which entity?
A: Microsoft
B: The Data Mining Group
C: Oracle
D: IBM
E: SPSS

The Data Mining Group

New cards

Data Refinery provides which of the following services?
A: Catalog the data assets.
B: Monitor for bias and model drift.
C: Visualize and prepare data.
D: Automatically build models.
E: All of the above.

Visualize and prepare data.

New cards

IBM SPSS Modeler includes what kind of models?
A: Classification models (for data with a categorical target).
B: Regression models (for data with a continuous target).
C: Clustering models (for data with no target variables).
D: Other kinds of models.
E: All of the above.

All of the above.

New cards

Open Neural Network eXchange (ONNX) was originally created for what models?
A: Clustering models
B: Decision trees
C: Support Vector Machines (SVM).
D: Deep learning models.
E: Regression models

Deep learning models.

New cards

Fill in the blank: If you'd like to schedule a notebook in Watson Studio to run at a different time, you can create a(n) \________.
A: API
B: asset
C: markdown cell
D: job

job

New cards

Fill in the blank: In the \__________ tab you can define the hardware size and software configuration for the runtime associated with Watson Studio tools such as Notebook.
A: assets
B: environments
C: overview
D: settings

environments

New cards

Fill in the blank: It's a best practice to remove or replace \_____________ before publishing to GitHub.
A: charts
B: code cells
C: credentials
D: markdown text

credentials

New cards

What does SQL stand for?
A: Strong Query Language
B: Structured Quick Language
C: Structured Query Language
D: Structured Quadrant Language

Structured Query Language

New cards

Which of these is a machine learning or deep learning library for Python?
A: Requests
B: Scikit-learn
C: Pandas
D: NumPy

Scikit-learn

New cards

What problem(s) are targeted by supervised machine learning?
A: Clustering problem
B: Regression problem
C: Classification problem
D: Regression and Classification problems

Regression and Classification problems

New cards

What is data governance?
A: Deleting data stored within a central repository
B: Creating processes and controls around the access of data
C: Storing data in a central repository
D: An implementation of data processes and controls

Creating processes and controls around the access of data

New cards

Which are the two most used open source tools for data science?
A: Notepad
B: RStudio
C: Jupyter Notebooks / JupyterLab
D: Spyder
E: VSCode

RStudio
Jupyter Notebooks / JupyterLab

New cards

How is the R programming language different than Python?
A: It was built by statisticians and their specific language
B: It's primary objective is deployment and production
C: It is a general purpose language

It was built by statisticians and their specific language

New cards

What type of environment is RStudio?
A: Software Development Kit (SDK)
B: API
C: Integrated Development Environment (IDE)
D: Framework

Integrated Development Environment (IDE)

New cards

What does the acronym "Jupyter" mean in Jupyter Notebooks?
A: Julia, Python and Ruby
B: Javascript, Python and R
C: Julia, PHP and R
D: Julia, Python and R

Julia, Python and R

100

New cards

Which feature in Watson Studio helps to keep track of and discover relevant Machine Learning assets?
A: Watson Knowledge Catalog
B: All of the above
C: AutoAI
D: Modeler Flows
E: OpenScale

Watson Knowledge Catalog