1/121
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are cookies?
Types of malware that can damage computers
Small files stored on computers that contain information about users
Programs that enable users to access websites
Pieces of code that store information about a website
Small files stored on computers that contain information about users
Fill in the blank: For data analytics projects, _____ data is typically preferred because users know it originated within the organization.
second-party
third-party
multi-party
first-party
first-party
A grocery store chain purchases customer data from a credit card company. The grocer uses this data to identify its most loyal customers and offer them special promotions and discounts. What type of data is being used in this scenario?
First-party
Multi-party
Third-party
Second-party
Second-party
In data analytics, what term refers to all possible data values in a dataset?
Source
Representation
Population
Sample
Population
An entertainment website displays a star rating for a movie based on user reviews. Users can select from one to five whole stars to rate the movie. The star rating is an example of what type of data? Select all that apply.
Continuous
Discrete
Ordinal
Nominal
Discrete
Ordinal
What type of data is the height of a skyscraper?
Discrete
Qualitative
Nominal
Continuous
Continuous
In data analytics, what is the term for data that is generated from, and lives, outside of an organization?
Peripheral
Outer
Internal
External
External
What are the key characteristics of unstructured data? Select all that apply.
Fits neatly into rows and columns
Unorganized
May have an internal structure
Clearly identifiable construction
Unorganized
May have an internal structure
Fill in the blank: A data model is used to organize _____ and how they relate to one another.
data visualizations
database structures
data elements
spreadsheet fields
data elements
When discussing structured databases, data analysts refer to the data contained in a row as a record. How do they refer to the data contained in a column?
Field
Subject
Character
Point
Field
Fill in the blank: A data type is a specific kind of data _____ that tells what kind of value the data is.
attribute
frame
model
point
attribute
What are the key characteristics of a text, or string, data type? Select all that apply.
Contains textual information
Only two possible values
Sequence of characters and punctuation
Has numerical percentages
Contains textual information
Sequence of characters and punctuation
In a data table, where are fields contained?
Rows
Columns
Favorites
Charts
Columns
When using long data, each subject has data in multiple rows. This is because each row represents what?
Data in different formats
True or false data points
One observation per subject
Multiple values
One observation per subject
What strategy do data professionals use in order to ensure unbiased sampling?
Use random sampling during data collection
Write survey questions that encourage specific responses
Store data in a spreadsheet
Skew results in a certain direction
Use random sampling during data collection
Fill in the blank: Bias is a _____ preference in favor of or against a person, group of people, or thing.
conscious or subconscious
sensible or insensible
fair or unfair
standard or substandard
conscious or subconscious
Which of the following are examples of sampling bias? Select all that apply.
A teacher gives higher grades to essays written in their own writing style.
A clinical study includes three times more men than women.
A survey of students does not include homeschooled students.
An election poll only interviews people with college degrees.
A clinical study includes three times more men than women.
A survey of students does not include homeschooled students.
An election poll only interviews people with college degrees.
What is the term for the tendency to search for or interpret information in a way that validates pre-existing beliefs?
Sampling bias
Confirmation bias
Observer bias
Interpretation bias
Confirmation bias
Which of the following terms are also ways of describing observer bias? Select all that apply.
Perception bias
Research bias
Spectator bias
Experimenter bias
Research bias
Spectator bias
Experimenter bias
Fill in the blank: Data is considered _____ when it is accurate, complete, and unbiased information that has been vetted and proven fit for use.
original
current
comprehensive
reliable
reliable
Which of the following are usually good data sources? Select all that apply.
Vetted public datasets
Social media sites
Governmental agency data
Academic papers
Vetted public datasets
Governmental agency data
Academic papers
To determine if a data source is cited, ask which of the following questions? Select all that apply.
When was this data last refreshed?
Who created this dataset?
Is this dataset from a credible organization?
Has this dataset been properly cleaned?
When was this data last refreshed?
Who created this dataset?
Is this dataset from a credible organization?
A junior data analyst learns that the dataset they have been given is six years old. After looking into this further, they also discover that the age of the data is making the information irrelevant to their project. What good data source principle have they used to evaluate the dataset?
Comprehensive
Original
Reliable
Current
Reliable
What are data ethics?
Established methods for ensuring data is clean, well-organized, and appropriate for a project
Long-standing techniques for confirming that data is always used to benefit society
Approved strategies data professionals use to safeguard the privacy and security of a dataset
Well-founded standards of right and wrong that dictate how data is collected, shared, and used
Well-founded standards of right and wrong that dictate how data is collected, shared, and used
What concept states that all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data?
Ownership
Currency
Privacy
Transaction transparency
Transaction transparency
A data analyst removes personally identifying information from a dataset. What task are they performing?
Data anonymization
Data sorting
Data collection
Data visualization
Data anonymization
Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What is this concept called?
Privacy
Discretion
Currency
Consent
Consent
Fill in the blank: Openness refers to _____ access, usage, and sharing of data.
protected
limited
free
disclosed
free
What is the preferred method for open data to be made available?
A convenient and modifiable internet download
A secure password-protected file
A compressed file format that keeps file size small
A print copy that is easily shared by anyone
A convenient and modifiable internet download
What are the main benefits of open data? Select all that apply.
Combines data from different fields of knowledge
Good data is more widely available
Restricts data access to certain groups of people
Increases the amount of data available for purchase
Combines data from different fields of knowledge
Good data is more widely available
What are the key aspects of universal participation? Select all that apply.
Certain groups of people must share their private data.
No one can place restrictions on data to discriminate against a person or group.
Everyone must be able to use, reuse, and redistribute open data.
All corporations are allowed to sell open data.
No one can place restrictions on data to discriminate against a person or group.
Everyone must be able to use, reuse, and redistribute open data.
Freedom from inappropriate use of your data is an element of which aspect of data ethics?
Consent
Transparency
Privacy
Currency
Consent
A data professional working on a project about commuters researches the origin of a dataset to confirm it was created by a reputable source, such as a government transportation agency. Which aspect of good data are they prioritizing?
Original
Comprehensive
Cited
Reliable
Cited
A hospital system wants to protect the personally identifiable information of its patients, such as names and medical records. They ask their data team to anonymize the data. What techniques might they use to achieve this goal?
Hashing
Masking
Sorting
Blanking
Masking
Blanking
A manager in charge of selling a particular product interprets any ambiguous customer feedback about the product as being positive. What type of bias does this represent?
Confirmation
Sampling
Interpretation
Observer
Confirmation
Which data ethics principle gives an individual the right to know why their data is collected and how long it will be stored?
Consent
Anonymization
Credibility
Privacy
Consent
A university surveys its student-athletes about their experience in college sports. The survey only includes student-athletes with scholarships. What type of bias does this scenario describe?
Sampling
Observer
Interpretation
Confirmation
Sampling
Fill in the blank: The data ethics principle of transaction transparency states that an individual has the right to understand all of the _____ and algorithms used on their data.
free access
data-processing activities
raw data
financial transactions
data-processing activities
A government agency allows any business, nonprofit, or citizen to access its databases and reuse or redistribute the data. What type of data is described in this scenario?
Closed
Allowable
Free
Open
Open
An investor with a background working in the tech industry interprets any pitch from a tech startup as being more promising than others, even if the information is confusing and ambiguous. What type of bias does this scenario describe?
Sampling
Interpretation
Observer
Confirmation
Interpretation
A magazine conducts research about people's reading preferences. They only include respondents who currently subscribe. What type of bias does this scenario describe?
Confirmation
Interpretation
Sampling
Observer
Sampling
Fill in the blank: The data ethics principle of _____ states that an individual has the right to understand all of the data-processing activities and algorithms used on their data.
transaction transparency
consent
ownership
currency
transaction transparency
A financial institution publishes data about stock prices and market trends, which any business, nonprofit, or citizen can access, reuse, or redistribute through its online databases. What type of data is described in this scenario?
Open
Allowable
Free
Closed
Open
Fill in the blank: A relational database contains a series of _____ that can be connected to form relationships.
tables
cells
fields
spreadsheets
tables
What is the term for an identifier that references a database column in which each value is unique?
Field
Relation
Primary key
Foreign key
Primary key
What process do data professionals use to eliminate data redundancy, increase data integrity, and reduce complexity in a database?
Composition
Normalization
Manipulation
Iteration
Normalization
Fill in the blank: When using a relational database, data analysts write _____ to request data from the related tables.
relationships
keys
queries
programs
queries
A large company has several databases across its many departments. What kind of metadata describes how many locations contain a certain piece of data?
Structural
Administrative
Descriptive
Representative
Structural
A large metropolitan high school gives each of its students an ID number to differentiate them in its database. What kind of metadata are the ID numbers?
Administrative
Structural
Descriptive
Representative
Descriptive
An international nonprofit organization wants to merge third-party data with its own data. Which of the following actions will help make this process successful? Select all that apply.
Use metadata to standardize the datasets.
Replace the incoming data's metadata with its own company metadata.
Use metadata to evaluate the third-party data's quality and credibility.
Alter the internal metadata to more closely reflect the incoming metadata.
Use metadata to standardize the datasets
Use metadata to evaluate the third-party data's quality and credibility
Fill in the blank: Data _____ is a process data professionals use to ensure the formal management of their organization's data assets.
sourcing
governance
organization
storage
governance
What are some key benefits of open-data initiatives? Select all that apply.
Limit opportunities for collaboration
Make government activities more transparent
Support innovation and economic growth
Help educate citizens about important issues
Make government activities more transparent
Support innovation and economic growth
Help educate citizens about important issues
What type of file saves data in a table format?
Calculated spreadsheet values (.csv)
Comma-separated values (.csv)
Cell-structured variables (.csv)
Compatible scientific variables (.csv)
Comma-separated values (.csv)
Bringing data from a .csv file into a spreadsheet is an example of what process?
Filing data
Importing data
Editing data
Normalizing data
Importing data
In Google Sheets, what function enables a data analyst to specify a range of cells in one spreadsheet to be duplicated in another?
SPECIFY
DUPLICATE
IMPORTRANGE
CELLRANGE
IMPORTRANGE
What is the process for arranging data into a meaningful order to make it easier to understand, analyze, and visualize?
Filtering
Prioritizing
Reframing
Sorting
Sorting
A data analyst is reviewing a national database of real estate sales. They are only interested in sales of condominiums. How can the analyst narrow their scope?
Sort by condominium sales
Filter out non-condominium sales
Filter out condominium sales
Sort by non-condominium sales
Filter out non-condominium sales
A data analyst works for a rental car company. They have a spreadsheet that lists car ID numbers and the dates cars were returned. How should they sort the spreadsheet to find the most recently returned cars?
By return date, in ascending order
By return date, in descending order
By car numerical ID, in ascending order
By car numerical ID, in descending order
By return date, in descending order
Fill in the blank: To keep a header row at the top of a spreadsheet, highlight the row and select _____ from the View menu.
set
lock
freeze
pin
freeze
Which statement is true about sampling, irrespective of sample size?
The sample standard deviation (Stdev) is the same as the population Stdev.
The sample distribution approximates to normal distribution.
The sample bias is reduced if the sample being selected is representative of the population.
The sample mean approaches the population mean
The sample bias is reduced if the sample being selected is representative of the population.
A data analyst is performing analysis on data stored in a big data platform with state-of-the-art analysis tools. Insufficient sample size is rendering the current analysis ineffective. What is the primary challenge of a larger sample size?
Collecting a larger sample size is more expensive.
Analyzing a larger sample size is complex.
Storing a larger sample size is difficult.
Cleansing a larger sample size is complicated.
Collecting a larger sample size is more expensive.
In the next six months, an analyst is expected to analyze and present the effects of monthly promotions on sales of a new product released one month ago. Which solution for insufficient data should this analyst pursue?
Look for a new data set
Identify trends with available data
Speak with stakeholders and adjust the objective
Wait for more data
Wait for more data
A team leader is assigned the task of evaluating the schema of a data set as part of data cleansing. How would the team leader define a schema to the analyst collaborating on the project prior to commencing cleansing?
How well two or more data sets work together
A way of describing how something is organized
A process of combining two or more data sets into a single data set
A way of matching fields in separate databases
A way of describing how something is organized
An analyst performing data cleansing on invoice data would like to select and view rows that have an amount paid that is greater than $100. Which spreadsheet functionality should the analyst use?
COUNTIF
Conditional formatting
Filter
Remove duplicates
Filter
Why is clean data critical for data analysis?
It ensures data drives the decision the analyst intends to communicate.
It ensures data used for analysis reflects operational reality.
It ensures data is structured to enable effective analysis.
It ensures data is visualized to make decisions.
It ensures data used for analysis reflects operational reality.
What does an UPDATE statement do after execution in Structured Query Language (SQL)?
Modifies values in certain cells of a table based on conditions
Inserts new rows into the table based on data provided in the query
Removes tables based on the query
Creates a temporary table to be downloaded for analysis and visualization
Modifies values in certain cells of a table based on conditions
An analyst wants to extract data from a table by filtering for certain conditions prior to performing data cleansing. Which Structured Query Language (SQL) statement will perform this function?
LENGTH statement
DISTINCT statement
TRIM statement
WHERE statement
WHERE statement
An analyst wants to combine the fields for city and state into a single field prior to extraction for data cleansing. Which Structured Query Language (SQL) statement will perform this function?
CONCAT statement
CAST statement
ORDER BY statement
COALESCE statement
CONCAT statement
An analyst wants to present a high-level summarized version of the data at the end of data cleansing. Which spreadsheet functionality should the analyst use?
Conditional formatting
Pivot table
Find and replace
Filters
Pivot table
What is the definition of verification in the data cleaning process?
A process of ensuring the degree to which a set of measures is equivalent across systems
A process of chronologically listing the modifications made to a set of data files during the data cleansing process
A process to confirm how accurate and reliable a data set is following data cleaning
A process to report on the results of data cleansing efforts to help build trust in the data and cleansing
A process to confirm how accurate and reliable a data set is following data cleaning
What is an outcome of the verification step of data cleansing?
Compares dirty data with clean data
Ensures the data collected and cleansed will address the original purpose
Chronologically documents how the data set evolved during the project
Helps build trust in the cleansing process and data
Ensures the data collected and cleansed will address the original purpose
An analyst used a column of a table to uniquely identify each record within a table. Which tool did they use?
Normalization
Field
Foreign key
Primary key
Primary key
Which tool is used by data analysts to store and organize data, making it easier for them to manage and access information?
Primary key
Table
Database
Foreign key
Database
Which element of a Notepad file would be considered data as opposed to underlying metadata?
File contents
File description
FIle date
File size
File contents
What is an example of administrative metadata for a digital file?
File size
File name
File contents
File permission
File permission
What is an acceptable syntax for the SELECT keyword in MySQL?
"select"
select
SELECT_Keyword
'select'
select
An expert in query languages searched for month_name = DEC using Vertica. The data set contains variations of the word December, such as dec, Dec, etc. What will the output of this search query be?
It will return all entries that match DEC only.
It will return all entries that match dec only.
It will return all entries such as dec, Dec, DEC.
It will return all entries that match Dec only.
It will return all entries that match DEC only.
What is the general rule regarding the suggested length of each line in a query to maintain indentation best practices?
Less than or equal to 50 characters
Greater than or equal to 100 characters
Less than or equal to 100 characters
Greater than or equal to 50 characters
Less than or equal to 100 characters
What is a feature of the filtering process when applied to spreadsheets?
Filtering hides the data temporarily.
Filtering orders the data temporarily.
Filtering removes the data permanently.
Filtering orders the data meaningfully.
Filtering hides the data temporarily.
Which process utilizes logical and descriptive names for files, making them easier to find and use?
Safety measures
Foldering
Normalization
Data validation
Foldering
Which process may restrict data analysis needs and should be balanced with data access needs?
Data security
Data storage
Data organization
Data normalization
Data security
What is the best practice for naming folders and subfolders to organize data?
Use special character names.
Use descriptive names.
Use numeric names.
Use spaces in the names
Use descriptive names.
Which file name follows formatting conventions?
SalesReport 2021
SalesReport*2021
SalesReport_2021
SalesReport!2021
SalesReport_2021
Which of the following items are examples of structured data? Select all that apply.
Price list
Scanned medical images
Data table
Audio recording
Price list
Data table
Fill in the blank: The number of points scored in a basketball game is an example of _____ data.
discrete
open
nominal
continuous
discrete
Which of the following statements accurately describe first-, second-, and third-party data? Select all that apply.
-When using third-party data, it's important to confirm its accuracy.
-Second-party data is sold by a trusted partner to another party.
-Third-party data is collected by an individual or group using their own resources.
-A key benefit of using first-party data is that the user knows where it came from.
-When using third-party data, it's important to confirm its accuracy.
-Second-party data is sold by a trusted partner to another party.
-A key benefit of using first-party data is that the user knows where it came from.
What is the most likely reason why a data analyst would use historical data instead of gathering new data?
The data is unknown
The data is constantly changing
The project has a very short time frame
The project is unimportant
The project has a very short time frame
A political scientist needs to poll all voters in Seoul, South Korea, in order to predict the outcome of an election. Because it would be impossible to collect data from every single person in the city, the political scientist polls a part of the population that is representative of the whole. What does this scenario describe?
Using a population
Choosing a data type
Using a sample
Choosing quantitative data
Using a sample
Which of the following items are examples of continuous data? Select all that apply.
Duration of a customer service call
Favorite social media platform
Number of employees at a company
Temperature of a swimming pool
Duration of a customer service call
Temperature of a swimming pool
Which of the following questions would enable a data professional to collect nominal qualitative data?
How many books do you own?
How many years of experience do you have?
Is this your first time dining at this restaurant?
What is your height?
Is this your first time dining at this restaurant?
A data scientist at a tech company records whether users have accepted their company's terms of service or not. What data type is being collected in this scenario?
Text
Boolean
String
Numerical
Boolean
On very short notice, a data analyst is asked to create a report for stakeholders. Because of the challenging time frame, what type of data might yield the best results?
Theoretical
Fabricated
Unclean
Historical
Historical
Which example shows the use of primary data?
U.S. census data used by a university
A company's survey data of its customers' satisfaction
Data purchased from a market firm containing customer profiles
Data from a published journal cited in a student's research paper
A company's survey data of its customers' satisfaction
What is the difference between raw data and information?
Raw data is universal, while information is specific.
Raw data is organized, while information is unstructured.
Raw data is corrupt, while information is secure.
Raw data is unorganized, while information is structured.
Raw data is unorganized, while information is structured.
Which role does an analyst have in collecting second-party data?
Contracts with data aggregators
Acquires internal data
Contracts with an external entity
Surveys sample populations
Contracts with an external entity
An analyst needs to show the geographic distribution of customers in the United States by region. Which visual representation should this analyst use?
Sales reports
Field descriptions
Charts
A data dictionary
Charts
Which type of structured data does an analyst use?
Social media posts
User-created content
Videos of store traffic
Store inventory
Store inventory
What is an example of conceptual data modeling that an analyst uses?
Using mathematical models for predictive analysis of experiments
Defining how individual records are uniquely identified in a database
Defining the business requirements for a new database
Using table names, column names, and data types for the database
Defining the business requirements for a new database
What leads to confirmation bias in data collection?
People experience the same circumstance differently.
People view the same object differently.
People search to verify preexisting beliefs.
People perceive ambiguous situations in a positive or negative way.
People search to verify preexisting beliefs.
How does an analyst ensure that a data source is reliable?
It includes data needed to answer the research question.
It is accurate, complete, and unbiased information.
It is validated against the original source.
It includes data that is current and relevant to the study.
It is accurate, complete, and unbiased information.