1/187
A complete vocabulary set based on the CompTIA Data+ Study Guide Glossary, covering database management, statistical analysis, and data governance.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
acceptable use agreement
an agreement that describes not only how data can be used, but also for what purpose
accountability
when data governance plans are being followed and there are accountability measures in place
ad hoc report
a report that is generated in response to a one-time request
Advanced Encryption Standard (AES)
a Federal Information Standards (FIPS)-approved cryptographic algorithm that can be used to protect electronic data
aggregate functions
functions that are written for all or a group of records, as opposed to a single record
aggregated data
data that has already been compiled and summarized for the purposes of analysis and reporting
alternative hypothesis
the assumption that a relationship exists between two variables
append
to combine data from one data set with another data set
appendix
apart of the narrative that provides additional details related to the report or process that is not essential to the main content
application programming interface (API)
a library of programming utilities used, for example, to enable software developers to access functions of the TCP/IP network stack under a particular operating system
application programming interface (API)
a library of programming utilities used, for
example, to enable software developers to access functions of the TCP/IP network stack
under a particular operating system
ascending and descending order
a method of sorting in which fields are sorted with the minimum on top (for ascending) or maximum on top (for descending)
using the power of software to ensure data achieves a validated result
automated validation
a chart that displays information, listing the categories on the y-axis and the discrete values on the x-axis
bar chart
bins / binning
defined intervals or "buckets" used to group continuous numerical data into discrete categories.
Binning (or bucketing/discretization)
is the data preprocessing technique of assigning individual data points into these intervals
which type of graph is binning associated with ?/
histograms
bin size calculation
To determine a good bin size, take the range of the data (largest value minus the smallest value) and divide it by the number of bins, typically between 5 and 15: number of binslargest value−smallest value.
bubble chart
a visual that plots points on an x-axis and y-axis similar to a scatter plot, but with the addition of the size of the dot representing a third variable
captioning
designating more meaningful names for fields in a report or dashboard
cardinality
how many possible occurrences of one entity can be associated with the number of occurrences in another
cascade delete
referential integrity setting that deletes all related records when the primary key is deleted
referential integrity setting that updated all related records when the primary key is changed
cascade update
causal relationship
a relationship in which one variable is proven to have an effect on another
chi-square statistic
a value that compares the size of the difference between the expected result and the actual result
chi-square test
a test used to determine if a difference exists between groups; produces the chi-square statistic
a chart that displays information, listing the categories on the x-axis and the discrete values on the y-axis
column chart
conceptual data model
the conceptual view of what should exist in a data system and how it could be related
confidence interval
a calculation of values that describes the certainty or uncertainty of an estimate made on the analysis
continuous data
a characteristic of quantitative data that identifies data that can be measured and can use any value
cross validation
determining whether data collected across different methods is consistent and accurate
custom sorts
sorting when you create the data set to include the value and the sort order you need for your visualization
correlation
the statistical association between two (or more) equal variables that tells us if one variable changes, the other(s) will too
data at rest
data that is being stored
data custodian
the person who manages the system on which the data assets are stored
data dictionary
a document that serves as the authority on all definitions that have been agreed upon for the organization, as well as key metrics
data governance
a large umbrella term for a framework used to govern data in an organization
data in transit
data that is actively being transferred
data in use
data that has been transmitted and is now present in memory or being queried
data lake
a technology for storing large amounts of structured and unstructured types of information in their original format
data lakehouse
a data management system that combines the best of both data warehousing and data lakes
data mart
a subset of the data warehouse that is dedicated to a specific department or group
data owner
the person who holds the ultimate responsibility for maintaining the confidentiality, integrity, and availability of the information asset
data steward
the person who is primarily responsible for data quality
data validation
the process of confirming the type, structure, and accurate representation of the data
data verification
the process of confirming that the data is accurate or true
data warehouse
a technology that is dedicated to the store of company data from a wide range of sources for reporting and decision making purposes
discrete data
a characteristic of quantitative value that identifies data that can be counted and can only take on a certain number of values
delimited files
files in which some form of character separates each field of data from the other data fields
delta load
the method of loading new data into a data system and updating any existing data
that has changed since the last load
denormalized data
data that has not gone through a normalization process and contains repetitive data
dependent variable
the variable we are measuring when comparing two groups
dimension table
a table that holds attributes or the categorial information that supports the fact tables
discrete data
a characteristic of quantitative value that identifies data that can be counted and can only take on a certain number of values
domain integrity
the acceptable values for a field
duplicated data
data that is repeated within the same data set
dynamic report
a report that is connected to the data and can be refreshed on demand or regularly updated automatically; also known as real-time report
ELT (Extract, Load, Transform)
the process that occurs when moving data from source systems to data lakes, which holds data in preparation for transformation
empirical rule
the tendency of most data points in normal distribution to fall within three points of the mean either on the positive or negative side of the curve
entity integrity
the unique identifier of a record as defined using a primary key field
ETL (Extract, Transform, Load)
the process that occurs when moving data from source systems to data warehouses by extracting data from the source, transforming the data, and then loading it to the warehouse
exploratory analysis
analysis that determines the main characteristics of a data set
Extensible Markup Language (XML)
a system for structuring documents so that they are human- and machine-readable; information within the document is placed within tags, which describe how information within the document is structured
fact table
a table that holds the "facts" about a particular business process or event and contains keys to relate to the other tables
field definitions
descriptive information about what each field contains, intended to clarify field names that may be ambiguous
flat files
delimited files that are exported out of a system
full load
the method of loading all data into a data system for the very first time
gap analysis
the study of a present state, desired state, and the gaps that exist between the two
goodness of fit
a chi-square test that tests against a single variable to analyze the relationship between variables
hard-coded filters
filters that are coded into the view or the visual
Hyper Text Markup Language (HTML)
a system of coded tags that identify the structure of
the document files used for web pages
imputing
replacing data with an estimated value
independent variable
the variable that is different between two groups that we are
comparing
index field
a field that applies a unique number to a record
indexing
a field property setting that tells the database that a field needs to be indexed
infographic
any combination of visuals, artwork, photos, and language that tells the story of
your data in a compelling and graphically appealing way
inline append
an append query that combines data sets until all are combined
intellectual property (IP)
intangible products of human thought and ingenuity
interactive filters
filters that allow the consumer to adjust a slicer or filter option on a dashboard to narrow down the data they want to see
intermediate append
an append query that creates a combined data set but also retains the
separate data sets
JSON
an object-oriented, event-driven programming
language that allows us to interact with websites
key performance indicators (KPls)
measurements/goals that are established to help identify
whether a business is achieving its objectives
key value pair
a type of non-relational structure that establishes a unique identifier or key field and maps it to a value
legend
a labeling element that lets you know which color represents which value in a visual
lifecycle of data
the five stages of the life of data: create, store, use, archive, and destroy
line graph
a graph that consists of either a single horizontal line or a group of multiple lines that represent different data points at different times; also known as run chart
link analysis
analysis that helps us determine how a single data point links to other data
points
logical data model
a more detailed view of the conceptual model that includes data fields
and the relationships between them
logical functions
functions that check if a condition is met and return a result based on
whether or not the condition is met
masking
the act of hiding the original value of data by showing something else in its place;
also known as anonymization
master data management
tools and processes that are used to create the single source of truth or the "golden record" for the data that is considered critical at the organization
mean
the average of a set of numbers, calculated by adding all the values and then dividing that sum by the total number of values: Total Number of ValuesSum of Values.
measures of central tendency
mathematical functions used to find the center of a data set, including the mean, median, and mode
measures of dispersion
mathematical functions used to determine the distribution of a data set, also known as measures of variability (e.g., standard deviation, variance, range)
median
the middle number within a group of sorted numbers
MOU
an acceptable use agreement that establishes the
rules of engagement between two parties and defines roles and expectations
merge fields function
a function used to combine different fields to create and display a single consolidated field; also known as CONCATENATE function
mockup
to draw out a potential layout
mode
the number that shows up most often in a data set