1/84
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Relational,
Dimensional,
Object-Oriented,
Fact-Based,
Time-Based,
NoSQL
most common schemes used to represent data (6)
Dr. Edward Codd,
1970,
relational theory
First articulated by _____ in _____, _____ provides a systematic way to organize data so that they reflected their meaning.
relational theory
This approach had the additional effect of reducing redundancy in data storage.
Information Engineering(IE),
Integration Definition for Information Modeling (IDEF1X),
Barker Notation,
Chen Notation
several different kinds of notation to express the association between entities in relational modeling (4)
IE syntax
The most common form (relational modeling) is _____, with its familiar tridents or ‘crow’s feet’ to depict cardinality.
dimensional modeling,
General Mills,
Dartmouth College,
1960’s
The concept of _____ started from a joint research project conducted by _____ and _____ in the _____
dimensional modeling
data is structured to optimize the query and analysis of large amounts of data.
operational systems
_____ that support transaction processing are optimized for fast processing of individual transactions.
Overwrite (Type 1),
New Row (Type 2),
New Column (Type 3)
three main types of change
ORC
three main types of change are sometimes known by _____
Overwrite (Type 1)
The new value overwrites the old value in place.
New Row (Type 2)
The new values are written in a new row, and the
old row is marked as not current.
New Column (Type 3)
Multiple instances of a value are listed in columns on the same row, and a new value means writing the values in the series one spot down to make space at the front for the new value.
The last value is discarded.
Snowflaking
is the term given to normalizing the flat, single-table, dimensional structure in a star schema into the respective component hierarchical or network structures.
grain
stands for the meaning or description of a single row of data in a fact table;
this is the most detail any row will have.
Conformed dimensions
are built with the entire organization in mind instead of just a particular project;
this allows these dimensions to be shared across dimensional models, due to containing consistent terminology and values.
Conformed facts
use standardized definitions of terms across individual marts.
Unified Modeling Language
is a graphical language for modeling software.
has a variety of notations of which one (the class model) concerns databases.
Unified Modeling Language class model
specifies classes(entity types) and their relationship types
class
has Operations or Methods (also called its “behavior”).
Class behavior
is only loosely connected to business logic because it still needs to be sequenced and timed.
table
In ER terms, the _____ has stored procedures/triggers.
Public,
Internally Visible,
Private
Class Operations can be (3)
Public
Externally visible
Internally Visible
Visible to children Objects
Private
Hidden
Fact-Based Modeling
a family of conceptual modeling languages, originated in the late 1970s.
Fact-based languages
view the world in terms of objects, the facts that relate or characterize those objects, and each role that each object plays in each fact.
Fact-based models
do not use attributes, reducing the need for intuitive or expert judgment by expressing the exact relationships between objects (both entities and values).
Object-Role Modeling (ORM)
is a model-driven engineering approach that starts with typical examples of required information or queries presented in any external formulation familiar to users, and then verbalizes these examples at the conceptual level, in terms of simple facts expressed in a controlled natural language
Fully Communication Oriented Modeling (FCO-IM)
is similar in notation and approach to ORM.
Time-based patterns
are used when data values must be associated in chronological order and with specific time values.
Data Vault
is a detail-oriented, time-based, and uniquely linked set of normalized tables that support one or more functional areas of business.
It is a hybrid approach, encompassing the best of breed between third normal form and star schema.
are designed specifically to meet the needs of enterprise data warehouses.
hubs,
links,
satellites
three types of entities
hub
The Data Vault design is focused around the functional areas of business with the _____ representing the primary key.
links
provide transaction integration between the hubs.
satellites
provide the context of the hub primary key
Anchor Modeling
is a technique suited for information that changes overtime in both structure and content.
It provides graphical notation used for conceptual modeling similar to traditional data modeling, with extensions for working with temporal data.
anchors,
attributes,
ties,
knots
Anchor Modeling has four basic modeling concepts:
NoSQL
is a name for the category of databases built on non-relational technology.
document,
key-value,
column-oriented,
graph
four main types of NoSQL databases
document
Instead of taking a business subject and breaking it up into multiple relational structures, document databases frequently store the business subject in one structure called a _____
Key-value databases
allow an application to store its data in only two columns (‘key’ and ‘value’), with the feature of storing both simple (e.g., dates, numbers, codes) and complex information (unformatted text, video, music, documents, photos) stored within the ‘value’ column.
column-oriented
Out of the four types of NoSQL databases, _____ is closest to the RDBMS.
Both have a similar way of looking at data as rows and values.
graph database
is designed for data whose relations are well represented as a set of nodes with an undetermined number of connections between these nodes.
Standards Planning and Requirements Committee (SPARC)
Levels of Detail In 1975, the American National Standards Institute’s _____ published their three-schema approach to database management.
Conceptual,
External,
Internal
Levels of Detail In 1975, the American National SPARC published their three-schema approach to database management.
The three key components were:
Conceptual
This embodies the ‘real world’ view of the enterprise being modeled in the database.
It represents the current ‘best model’ or ‘way of doing business’ for the enterprise.
External
The various users of the database management system operate on subsets of the total enterprise model that are relevant to their particular needs.
These subsets are represented as ‘external schemas’.
Internal
The ‘machine view’ of the data is described by the internal schema.
This schema describes the stored representation of the enterprise’s information
conceptual data model
captures the high-level data requirements as a collection of related concepts.
It contains only the basic and critical business entities within a given realm and function, with a description of each entity and the relationships between entities.
logical data model
A _____ is a detailed representation of data requirements, usually in support of a specific usage context, such as application requirements.
are still independent of any technology or specific implementation constraints.
often begins as an extension of a conceptual data model.
relational logical data model
the conceptual data model is extended by adding attributes.
dimensional logical data model
is in many cases a fully-attributed perspective of the dimensional conceptual data model
logical relational data model
captures the business rules of a business process
logical dimensional
captures the business questions to determine the health and performance of a business process.
physical data model (PDM)
represents a detailed technical solution, often using the logical data model as a starting point and then adapted to work within a set of hardware, software, and network tools.
physical data model,
structures
Because _____ accommodates technology limitations,
_____ are often combined (denormalized) to improve retrieval performance
Canonical Model
A variant of a physical scheme is a _____, used for data in motion between systems.
This model describes the structure of data being passed between systems as packets or messages.
Enterprise Service Bus (ESB),
Enterprise Application Integration (EAI)
When sending data through web services, an ______ or through _____, the canonical model describes what data structure the sending service and any receiving services should use.
view
is a virtual table
provide a means to look at data from one or many tables that contain or reference the actual attributes
are used to simplify queries, control data access, and rename columns, without the redundancy and loss of referential integrity due to denormalization.
standard view
runs SQL to retrieve data at the point when an attribute in the view is requested.
instantiated/materialized view
runs at a predetermined time
Partitioning
refers to the process of splitting a table
It is performed to facilitate archiving and to improve retrieval performance
Vertically split
To reduce query sets, create subset tables that contain subsets of columns
Horizontally split
To reduce query sets, create subset tables using the value of a column as the differentiator.
Denormalization
is the deliberate transformation of normalized logical data model entities into physical tables with redundant or duplicate data structures.
can also be used to enforce user security by segregating data into multiple views or copies of tables according to access needs.
This process does introduce a risk of data errors due to duplication.
Combining data from multiple other tables in advance,
Creating smaller, pre-filtered copies of data,
Pre-calculating and storing costly data calculations
There are several reasons to denormalize data.
The first is to improve performance by:
______ to avoid costly run-time joins
______ to reduce costly run-time calculations and/or table scans of large tables
______ to avoid run-time system resource competition
collapsing,
combining
In dimensional data modeling, denormalization is called _____ or _____.
star schema
If each dimension is collapsed into a single structure, the resulting data model is called a _____
snowflake
If the dimensions are not collapsed, the resulting data model is called a _____
normalization
is the process of applying rules in order to organize business complexity into stable data structures.
The basic goal of _____ is to keep each attribute in only one place to eliminate redundancy and the inconsistencies that can result from redundancy.
First normal form (1NF),
Second normal form (2NF),
Third normal form (3NF),
Boyce/Codd normal form (BCNF),
Fourth normal form (4NF),
Fifth normal form (5NF)
Normalization levels (6)
First normal form (1NF)
Ensures each entity has a valid primary key,
and every attribute depends on the primary key
removes repeating groups, and
ensures each attribute is atomic (not multi-valued).
includes the resolution of many-to-many relationships with an additional entity often called an associative entity.
Second normal form (2NF)
Ensures each entity has the minimal primary key and that
every attribute depends on the complete primary key.
Third normal form (3NF)
Ensures each entity has no hidden primary keys and that
each attribute depends on no attributes outside the key (“the key, the whole key and nothing but the key”).
Fourth normal form (4NF)
Resolves all many-to-many-to-many relationships (and beyond) in pairs until they cannot be broken down into any smaller pieces.
Boyce / Codd normal form (BCNF)
Resolves overlapping composite candidate keys.
candidate key
is either a primary or an alternate key
Composite
means more than one (i.e., two or more attributes in an entity’s primary or alternate keys)
overlapping
means there are hidden business rules between the keys.
Fifth normal form (5NF)
Resolves inter-entity dependencies into basic pairs, and all join dependencies use parts of primary keys.
Abstraction
is the removal of details in such a way as to broaden applicability to a wide class of situations while preserving the important properties and essential nature from concepts or subjects.
is the concept of exposing only the required essential characteristics and behavior with respect to a context.
Generalization
groups the common attributes and relationships of entities into super type entities
specialization
separates distinguishing attributes within an entity into subtype entities.
is usually based on attribute values within an entity instance.