1/65
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
SaaS (Software as a Service)
a product that is run and managed by the service provider.
PaaS (Platform as a Service)
focused on the deployment and management of your apps.
Infrastructure as a Service (Iaas)
the building blocks for cloud IT. provides access to networking features, computers, and storage space.
database administrator
configures and maintains a databse.
responsibilities:
database management
manages security, granting user access
backups
monitors performance
data engineer
design and implement data tasks related to the storage of big data.
responsibilities:
database pipelines and process
data ingestion storage
prepare data for analytics
prepare data for analytical processing
data analyst
analyzes business data to reveal important information.
responsibilities:
provides insights into the data
visual reporting
modeling data for analysis
combines data for visualization
data
units of information that could be in the form of numbers, text or machine code, images, videos, audio or physical
data documents
defines the collective form in which data exists
data sets
logical grouping of units of data that are generally closely related and/or share some data structure
data structures
structured data
data types
a single unit of data that tells a compiler or interpreter how data is supposed to be used
batch and streaming data
how do we move our data around?
relational and non-relational
how do we access, search, and query our data?
data modeling
how do we prepare and design our data?
schemas and schemaless
how do we structure our data for search?
data integrity and search?
how do we trust our data?
normalized and de-normalized
how do we trade quality vs. speed?
schema
a formal language which describes the data structure of a database
schemaless
when the primary “cell” of database can accept many types
query
a request for data results (reads) or to perform operations like inserting, updating, or deleting data within a database
data result
results the data returned from query
querying
the act of performing a query
query language
a scripting or programming language designed as the format to submit a request or action to a databaseused to manage and manipulate data.
batch processing
when a collection of data is sent to be processed
stream processing
when data is processed as soon as it arrives to enable real-time analytics and immediate responses.
tables
a logical grouping of rows and columns
views
a result set of a stored query on data stored in memory
materialized view
a result of a stored query on data in a disk
indexes
a copy of data that is sorted by one or multiple columns for faster reads at the cost of storage
constraints
rules applied to writes, that can ensure data integrityt
trigger
a function that is triggered on specific database events
primary key
one or multiple columns that uniquely identify a table
foreign key
column that holds primary key from other key to establish a relationship to maintain referential integrity in relational databases
relational databases
establishes relationships to other tables through foreign keys referencing another tables primary key
one-to-one
a type of relationship in relational databases where each row in one table is linked to a single row in another table
one-to-many
a type of relationship in relational databases where a row in one table can be related to many rows in another table
many-to-many
a type of relationship in relational databases where multiple rows in one table can be related to multiple rows in another table
many-to-many (via join/junction table)
A relationship in relational databases where multiple records in one table are associated with multiple records in another table, typically facilitated through a join or junction table.
row-store
data organized in rows
traditional relational databases are row-store
good for general purpose databases
suited for online transaction processing
good when needing all columns in a row
not the best at analytics or massive amounts of data
column-store
data is organized into columns
NoSQL or SQL-like databases
great for vast amounts of data
suited for online analytical processing
good when only a few columns needed
database indexes
a data structure that improves the speed of reads from the database table by storing the same or partial redundant data in a more efficient logical order
data integrity
the maintenance and assurance of, data accuracy and consistency over its entire life cycle
goal of data integrity
ensures data is recorded exactly as intended
data corruption
the act or state of data not being in the intended state will result in a data loss or malfunction
normalized data
a schema design to store redundant and consistent data
denormalized data
a schema that combines data so that accessing data is fast and efficient.
pivot table
a table that summarizes the data of a more extensive table from a: database, spreadsheet, or business intelligence (BI) tool
data consistently
when data is being kept in two different places and whether the data exactly matches or does not match
when having duplicates of your data in many places and need to keep up-to-date
strongly consistent
every time you request data (query) you can expect consistent data to be returned within a certain amount of time
will never return old data, but will have to wait at least 2 seconds for query to return
eventually consistent
when you request data it may be inconsistent within 2 seconds
getting back whatever data is currently in the database, could be old or new data, if you wait a little longer it will be up-to-date
synchronous
continuous stream of data that is synchronized by a timer or clock (guarantee of time)
guaranteed consistency at time of access, slower access times
asynchronous
continuous stream of data separated by start and stop bits (no guarantee of time)
faster access time, no guaranteed consistencyfaster access times with potential inconsistency
Non-relational data
a non-table form of storing data and will be optimized for different kinds of data structures
data source
where data originally comes from; an analytics tool may be connected to multiple data source to create a visualization or report
data store
a repository for persistently storing and managing collections of unstructured and semi structured data
database
a data-store that stores semi-structured and structured dataused to manage, retrieve, and manipulate data. It often consists of tables that relate to one another.
data warehouse
a relational/non-relational database designed for analytical workloads, which is generally column-oriented data storeoptimized for querying and reporting.
data mart
allows different teams or departments to have control over their own dataset
a subset of a data warehouse
will store under 100gb and has a single business focus
data lake
a centralized storage repository that holds vast amounts of raw data in its native format until it's needed for analysisand can store structured, semi-structured, and unstructured data.
data lakehouse
combines the best factors of a data warehouse and a data laketo provide a unified analytics platform that supports both structured and unstructured data.
data lakehouses compared to data warehouse
support video, audio, and text files
support data science and ML workloads
support for both streaming and ELT
work with open-source formats
data resides in data lake or blob stores
data lakehouses compared to data lake
perform BI tasks well
easier to setup and maintain
has management features to prevent data lake turning into data swamp
more performant than a data lakeoffer better data governance and quality management
data structures
data that is organized in a specific storage format, which enables easy access and modification
unstructured
a bunch of lose data that has no organization or possible relation
semi-structured
data that can be browsed or search (with limitations)
structured
data that can be easily browsed or searched