1/216
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Data modeling
is the process of discovering, analyzing, and scoping data requirements, and then representing and communicating these data requirements in a precise form called the data model.
data model
is a precise form used to capture, communicate, and structure data requirements that have been discovered, analyzed, and scoped during the data modeling process.
Relational,
Dimensional,
Object-Oriented,
Fact-Based,
Time-Based,
NoSQL
data modeling and design most commonly used schemes (6)
conceptual,
logical,
physical
Models of these schemes exist at three levels of detail
entities,
relationships,
facts,
keys,
attributes
Examples of components (5)
Business Drivers
Data models are critical to effective management of data.
They:
Provide a common vocabulary around data
Capture and document explicit knowledge about an organization’s data and systems
Serve as a primary communications tool during projects
Provide the starting point for customization, integration, or even replacement of an application
Formalization,
Scope definition,
Knowledge retention/documentation
Goals and Principles
Confirming and documenting understanding of different perspectives facilitates (3)
Formalization
A data model documents a concise definition of data structures and relationships.
It enables assessment of how data is affected by implemented business rules, for current as-is states or desired target states.
Scope definition
A data model can help explain the boundaries for data context and implementation of purchased application packages, projects, initiatives, or existing systems.
Knowledge retention/documentation
A data model can preserve corporate memory regarding a system or project by capturing knowledge in an explicit form.
It serves as documentation for future projects to use as the as-is version.
system development lifecycle (SDLC)
Data modeling is most frequently performed in the context of systems development and maintenance efforts, known as the _____
model
is a representation of something that exists or a pattern for something to be made.
can contain one or more diagrams
Data model
describes an organization’s data as the organization understands it, or as the organization wants it to be.
contains a set of symbols with text labels that attempts visually to represent data requirements as communicated to the data modeler, for a specific set of data that can range in size from small, for a project, to large, for an organization.
Category information,
Resource information,
Business event information,
Detail transaction information
Types of Data that are Modeled (4)
Category information
Data used to classify and assign types to things.
Customers classified by market categories or business sectors; products classified by color, model, size, etc.; orders classified by whether they are open or closed.
Resource information
Basic profiles of resources needed conduct operational processes
Product, Customer, Supplier, Facility, Organization, and Account
Business event information
Data created while operational processes are in progress
Customer Orders, Supplier Invoices, Cash Withdrawal, and Business Meetings
Detail transaction information
often produced through point-of-sale systems (either in stores or online).
Entity,
Relationship,
Attribute,
Domain
Data Model Components (4)
entity
is a thing about which an organization collects information.
are sometimes referred to as the nouns of an organization.
can be thought of as the answer to a fundamental question – who, what, when, where, why, or how – or to a combination of these questions.
entity-type
a type of something is being represented
Entity instances
are the occurrences or values of a particular entity
rectangles or rectangles with rounded edges,
names inside
In data models, entities are generally depicted as _____ or _____ with their _____
Entity definitions
are essential contributors to the business value of any data model.
They are core Metadata.
High quality definitions
clarify the meaning of business vocabulary and provide rigor to the business rules governing entity relationships.
Clarity,
Accuracy,
Completeness
High quality data definitions exhibit essential characteristics (3)
Clarity
The definition should be easy to read and grasp.
Accuracy
The definition is a precise and correct description of the entity.
Completeness
All of the parts of the definition are present.
Relationship
are shown as lines on the data modeling diagram.
relationship,
navigation path,
edge or link
In relational schemes the term ____ is often used,
dimensional schemes the term _____ is often used, and in
NoSQL schemes terms such as _____ are used, for example.
Relationship aliases
can also vary based on level of detail.
relationship,
constraint or reference
A relationship at the conceptual and logical levels is called a _____,
but a relationship at the physical level may be called by other names, such as _____ or _____, depending on the database technology.
cardinality
In a relationship between two entities, _____ captures how many of one entity (entity instances) participates in the relationship with how many of the other entity.
is represented by the symbols that appear on both ends of a relationship line.
Without _____, the most one can say about a relationship is that two entities are connected in some way.
Data rules
are specified and enforced through cardinality.
arity
The number of entities in a relationship is the ‘_____’ of the relationship.
unary,
binary,
ternary
The most common arity relationships (3)
unary
also known as a recursive or self-referencing
unary relationship
involves only one entity
one-to-many recursive relationship
describes a hierarchy
many-to-many relationship
describes a network or graph
hierarchy
In a _____, an entity instance has at most one parent (or higher-level entity).
relational modeling
In _____, child entities are on the many side of the relationship, with parent entities on the one side of the relationship.
network
In a _____, an entity instance can have more than one parent.
binary
arity of two
binary relationship
A _____, the most common on a traditional data model diagram, involves two entities.
ternary
arity of three
ternary relationship
is a relationship that includes three entities.
foreign key
is used in physical and sometimes logical relational data modelling schemes to represent a relationship.
may be created implicitly when a relationship is defined between two entities, depending on the database technology or data modeling tool, and whether the two entities involved have mutual dependencies.
identifier
also called a key
is a set of one or more attributes that uniquely defines an instance of an entity.
simple,
surrogate,
compound,
composite
construction-type keys (4)
simple key
is one attribute that uniquely identifies an entity instance.
Ex. Universal Product Codes (UPCs) and Vehicle Identification Numbers(VINs).
surrogate key
is also an example of a simple key.
is a unique identifier for a table.
Often a counter and always system-generated without intelligence, a _____ is an integer whose meaning is unrelated to its face value.
compound key
is a set of two or more attributes that together uniquely identify an entity instance.
Ex. Phone number (area code + exchange + local number).
composite key
contains one compound key and at least one other simple or compound key or non-key attribute.
super key,
candidate key,
business key,
primary key,
alternate key
Function-type Keys (5)
super key
is any set of attributes that uniquely identify an entity instance.
candidate key
A _____ is a minimal set of one or more attributes (i.e., a simple or compound key) that identifies the entity instance to which it belongs.
business key
A _____ is one or more attributes that a business professional would use to retrieve a single entity instance.
primary key
A _____ is the candidate key that is chosen to be the unique identifier for an entity.
alternate key
An _____ can still be used to find specific entity instances.
primary key,
alternate keys
Often the _____ is a surrogate key and
the _____ are business keys.
independent entity
is one where the primary key contains only attributes that belong to that entity.
dependent entity
is one where the primary key contains at least one attribute from another entity.
domain
is the complete set of possible values that an attribute can be assigned.
provides a means of standardizing the characteristics of the attributes.
valid values,
invalid values
All values inside the domain are _____.
Those outside the domain are referred to as _____.
constraints
One can restrict a domain with additional rules, called _____
rules
can relate to format, logic, or both.
data type
Domains that specify the standard types of data one can have in an attribute assigned to that domain.
data format
Domains that use patterns including templates and masks, such as are found in postal codes and phone numbers, and character limitations (alphanumeric only, alphanumeric with certain special characters allowed, etc.) to define valid values.
List
Domains that contain a finite set of values.
These are familiar to many people from functionality like dropdown lists.
Range
Domains that allow all values of the same data type that are between one or more minimum and/or maximum values.
Some ranges can be open-ended.
Rule-based
Domains defined by the rules that values must comply with in order to be valid.
These include rules comparing values to calculated values or other attribute values in a relation or set.
Relational,
Dimensional,
Object-Oriented,
Fact-Based,
Time-Based,
NoSQL
most common schemes used to represent data (6)
Dr. Edward Codd,
1970,
relational theory
First articulated by _____ in _____, _____ provides a systematic way to organize data so that they reflected their meaning.
relational theory
This approach had the additional effect of reducing redundancy in data storage.
Information Engineering(IE),
Integration Definition for Information Modeling (IDEF1X),
Barker Notation,
Chen Notation
several different kinds of notation to express the association between entities in relational modeling (4)
IE syntax
The most common form (relational modeling) is _____, with its familiar tridents or ‘crow’s feet’ to depict cardinality.
dimensional modeling,
General Mills,
Dartmouth College,
1960’s
The concept of _____ started from a joint research project conducted by _____ and _____ in the _____
dimensional modeling
data is structured to optimize the query and analysis of large amounts of data.
operational systems
_____ that support transaction processing are optimized for fast processing of individual transactions.
Overwrite (Type 1),
New Row (Type 2),
New Column (Type 3)
three main types of change
ORC
three main types of change are sometimes known by _____
Overwrite (Type 1)
The new value overwrites the old value in place.
New Row (Type 2)
The new values are written in a new row, and the
old row is marked as not current.
New Column (Type 3)
Multiple instances of a value are listed in columns on the same row, and a new value means writing the values in the series one spot down to make space at the front for the new value.
The last value is discarded.
Snowflaking
is the term given to normalizing the flat, single-table, dimensional structure in a star schema into the respective component hierarchical or network structures.
grain
stands for the meaning or description of a single row of data in a fact table;
this is the most detail any row will have.
Conformed dimensions
are built with the entire organization in mind instead of just a particular project;
this allows these dimensions to be shared across dimensional models, due to containing consistent terminology and values.
Conformed facts
use standardized definitions of terms across individual marts.
Unified Modeling Language
is a graphical language for modeling software.
has a variety of notations of which one (the class model) concerns databases.
Unified Modeling Language class model
specifies classes(entity types) and their relationship types
class
has Operations or Methods (also called its “behavior”).
Class behavior
is only loosely connected to business logic because it still needs to be sequenced and timed.
table
In ER terms, the _____ has stored procedures/triggers.
Public,
Internally Visible,
Private
Class Operations can be (3)
Public
Externally visible
Internally Visible
Visible to children Objects
Private
Hidden
Fact-Based Modeling
a family of conceptual modeling languages, originated in the late 1970s.