Chapter 2_Data Modeling and Design

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/216

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

217 Terms

1
New cards

Data modeling

is the process of discovering, analyzing, and scoping data requirements, and then representing and communicating these data requirements in a precise form called the data model.

2
New cards

data model

is a precise form used to capture, communicate, and structure data requirements that have been discovered, analyzed, and scoped during the data modeling process.

3
New cards

Relational,

Dimensional,

Object-Oriented,

Fact-Based,

Time-Based,

NoSQL

data modeling and design most commonly used schemes (6)

4
New cards

conceptual,

logical,

physical

Models of these schemes exist at three levels of detail

5
New cards

entities,

relationships,

facts,

keys,

attributes

Examples of components (5)

6
New cards

Business Drivers

Data models are critical to effective management of data.

They:

  • Provide a common vocabulary around data

  • Capture and document explicit knowledge about an organization’s data and systems

  • Serve as a primary communications tool during projects

  • Provide the starting point for customization, integration, or even replacement of an application

7
New cards

Formalization,

Scope definition,

Knowledge retention/documentation

Goals and Principles

  • Confirming and documenting understanding of different perspectives facilitates (3)

8
New cards

Formalization

  • A data model documents a concise definition of data structures and relationships.

  • It enables assessment of how data is affected by implemented business rules, for current as-is states or desired target states.

9
New cards

Scope definition

A data model can help explain the boundaries for data context and implementation of purchased application packages, projects, initiatives, or existing systems.

10
New cards

Knowledge retention/documentation

  • A data model can preserve corporate memory regarding a system or project by capturing knowledge in an explicit form.

  • It serves as documentation for future projects to use as the as-is version.

11
New cards

system development lifecycle (SDLC)

Data modeling is most frequently performed in the context of systems development and maintenance efforts, known as the _____

12
New cards

model

  • is a representation of something that exists or a pattern for something to be made.

  • can contain one or more diagrams

13
New cards

Data model

  • describes an organization’s data as the organization understands it, or as the organization wants it to be.

  • contains a set of symbols with text labels that attempts visually to represent data requirements as communicated to the data modeler, for a specific set of data that can range in size from small, for a project, to large, for an organization.

14
New cards

Category information,

Resource information,

Business event information,

Detail transaction information

Types of Data that are Modeled (4)

15
New cards

Category information

  • Data used to classify and assign types to things.

  • Customers classified by market categories or business sectors; products classified by color, model, size, etc.; orders classified by whether they are open or closed.

16
New cards

Resource information

  • Basic profiles of resources needed conduct operational processes

  • Product, Customer, Supplier, Facility, Organization, and Account

17
New cards

Business event information

  • Data created while operational processes are in progress

  • Customer Orders, Supplier Invoices, Cash Withdrawal, and Business Meetings

18
New cards

Detail transaction information

often produced through point-of-sale systems (either in stores or online).

19
New cards

Entity,

Relationship,

Attribute,

Domain

Data Model Components (4)

20
New cards

entity

  • is a thing about which an organization collects information.

  • are sometimes referred to as the nouns of an organization.

  • can be thought of as the answer to a fundamental question – who, what, when, where, why, or how – or to a combination of these questions.

21
New cards

entity-type

a type of something is being represented

22
New cards

Entity instances

are the occurrences or values of a particular entity

23
New cards

rectangles or rectangles with rounded edges,

names inside

In data models, entities are generally depicted as _____ or _____ with their _____

24
New cards

Entity definitions

  • are essential contributors to the business value of any data model.

  • They are core Metadata.

25
New cards

High quality definitions

clarify the meaning of business vocabulary and provide rigor to the business rules governing entity relationships.

26
New cards

Clarity,

Accuracy,

Completeness

High quality data definitions exhibit essential characteristics (3)

27
New cards

Clarity

The definition should be easy to read and grasp.

28
New cards

Accuracy

The definition is a precise and correct description of the entity.

29
New cards

Completeness

All of the parts of the definition are present.

30
New cards

Relationship

are shown as lines on the data modeling diagram.

31
New cards

relationship,

navigation path,

edge or link

  1. In relational schemes the term ____ is often used,

  2. dimensional schemes the term _____ is often used, and in

  3. NoSQL schemes terms such as _____ are used, for example.

32
New cards

Relationship aliases

can also vary based on level of detail.

33
New cards

relationship,

constraint or reference

  1. A relationship at the conceptual and logical levels is called a _____,

  2. but a relationship at the physical level may be called by other names, such as _____ or _____, depending on the database technology.

34
New cards

cardinality

  • In a relationship between two entities, _____ captures how many of one entity (entity instances) participates in the relationship with how many of the other entity.

  • is represented by the symbols that appear on both ends of a relationship line.

  • Without _____, the most one can say about a relationship is that two entities are connected in some way.

35
New cards

Data rules

are specified and enforced through cardinality.

36
New cards

arity

The number of entities in a relationship is the ‘_____’ of the relationship.

37
New cards

unary,

binary,

ternary

The most common arity relationships (3)

38
New cards

unary

also known as a recursive or self-referencing

39
New cards

unary relationship

involves only one entity

40
New cards

one-to-many recursive relationship

describes a hierarchy

41
New cards

many-to-many relationship

describes a network or graph

42
New cards

hierarchy

In a _____, an entity instance has at most one parent (or higher-level entity).

43
New cards

relational modeling

In _____, child entities are on the many side of the relationship, with parent entities on the one side of the relationship.

44
New cards

network

In a _____, an entity instance can have more than one parent.

45
New cards

binary

arity of two

46
New cards

binary relationship

A _____, the most common on a traditional data model diagram, involves two entities.

47
New cards

ternary

arity of three

48
New cards

ternary relationship

is a relationship that includes three entities.

49
New cards

foreign key

  • is used in physical and sometimes logical relational data modelling schemes to represent a relationship.

  • may be created implicitly when a relationship is defined between two entities, depending on the database technology or data modeling tool, and whether the two entities involved have mutual dependencies.

50
New cards

identifier

  • also called a key

  • is a set of one or more attributes that uniquely defines an instance of an entity.

51
New cards

simple,

surrogate,

compound,

composite

construction-type keys (4)

52
New cards

simple key

  • is one attribute that uniquely identifies an entity instance.

  • Ex. Universal Product Codes (UPCs) and Vehicle Identification Numbers(VINs).

53
New cards

surrogate key

  • is also an example of a simple key.

  • is a unique identifier for a table.

  • Often a counter and always system-generated without intelligence, a _____ is an integer whose meaning is unrelated to its face value.

54
New cards

compound key

  • is a set of two or more attributes that together uniquely identify an entity instance.

  • Ex. Phone number (area code + exchange + local number).

55
New cards

composite key

contains one compound key and at least one other simple or compound key or non-key attribute.

56
New cards

super key,

candidate key,

business key,

primary key,

alternate key

Function-type Keys (5)

57
New cards

super key

is any set of attributes that uniquely identify an entity instance.

58
New cards

candidate key

A _____ is a minimal set of one or more attributes (i.e., a simple or compound key) that identifies the entity instance to which it belongs.

59
New cards

business key

A _____ is one or more attributes that a business professional would use to retrieve a single entity instance.

60
New cards

primary key

A _____ is the candidate key that is chosen to be the unique identifier for an entity.

61
New cards

alternate key

An _____ can still be used to find specific entity instances.

62
New cards

primary key,

alternate keys

  1. Often the _____ is a surrogate key and

  2. the _____ are business keys.

63
New cards

independent entity

is one where the primary key contains only attributes that belong to that entity.

64
New cards

dependent entity

is one where the primary key contains at least one attribute from another entity.

65
New cards

domain

  • is the complete set of possible values that an attribute can be assigned.

  • provides a means of standardizing the characteristics of the attributes.

66
New cards

valid values,

invalid values

  1. All values inside the domain are _____.

  2. Those outside the domain are referred to as _____.

67
New cards

constraints

One can restrict a domain with additional rules, called _____

68
New cards

rules

can relate to format, logic, or both.

69
New cards

data type

Domains that specify the standard types of data one can have in an attribute assigned to that domain.

70
New cards

data format

Domains that use patterns including templates and masks, such as are found in postal codes and phone numbers, and character limitations (alphanumeric only, alphanumeric with certain special characters allowed, etc.) to define valid values.

71
New cards

List

  • Domains that contain a finite set of values.

  • These are familiar to many people from functionality like dropdown lists.

72
New cards

Range

  • Domains that allow all values of the same data type that are between one or more minimum and/or maximum values.

  • Some ranges can be open-ended.

73
New cards

Rule-based

  • Domains defined by the rules that values must comply with in order to be valid.

  • These include rules comparing values to calculated values or other attribute values in a relation or set.

74
New cards

Relational,

Dimensional,

Object-Oriented,

Fact-Based,

Time-Based,

NoSQL

most common schemes used to represent data (6)

75
New cards

Dr. Edward Codd,

1970,

relational theory

First articulated by _____ in _____, _____ provides a systematic way to organize data so that they reflected their meaning.

76
New cards

relational theory

This approach had the additional effect of reducing redundancy in data storage.

77
New cards

Information Engineering(IE),

Integration Definition for Information Modeling (IDEF1X),

Barker Notation,

Chen Notation

several different kinds of notation to express the association between entities in relational modeling (4)

78
New cards

IE syntax

The most common form (relational modeling) is _____, with its familiar tridents or ‘crow’s feet’ to depict cardinality.

79
New cards

dimensional modeling,

General Mills,

Dartmouth College,

1960’s

The concept of _____ started from a joint research project conducted by _____ and _____ in the _____

80
New cards

dimensional modeling

data is structured to optimize the query and analysis of large amounts of data.

81
New cards

operational systems

_____ that support transaction processing are optimized for fast processing of individual transactions.

82
New cards

Overwrite (Type 1),

New Row (Type 2),

New Column (Type 3)

three main types of change

83
New cards

ORC

three main types of change are sometimes known by _____

84
New cards

Overwrite (Type 1)

The new value overwrites the old value in place.

85
New cards

New Row (Type 2)

  • The new values are written in a new row, and the

  • old row is marked as not current.

86
New cards

New Column (Type 3)

  • Multiple instances of a value are listed in columns on the same row, and a new value means writing the values in the series one spot down to make space at the front for the new value.

  • The last value is discarded.

87
New cards

Snowflaking

is the term given to normalizing the flat, single-table, dimensional structure in a star schema into the respective component hierarchical or network structures.

88
New cards

grain

  • stands for the meaning or description of a single row of data in a fact table;

  • this is the most detail any row will have.

89
New cards

Conformed dimensions

  • are built with the entire organization in mind instead of just a particular project;

  • this allows these dimensions to be shared across dimensional models, due to containing consistent terminology and values.

90
New cards

Conformed facts

use standardized definitions of terms across individual marts.

91
New cards

Unified Modeling Language

  • is a graphical language for modeling software.

  • has a variety of notations of which one (the class model) concerns databases.

92
New cards

Unified Modeling Language class model

specifies classes(entity types) and their relationship types

93
New cards

class

has Operations or Methods (also called its “behavior”).

94
New cards

Class behavior

is only loosely connected to business logic because it still needs to be sequenced and timed.

95
New cards

table

In ER terms, the _____ has stored procedures/triggers.

96
New cards

Public,

Internally Visible,

Private

Class Operations can be (3)

97
New cards

Public

Externally visible

98
New cards

Internally Visible

Visible to children Objects

99
New cards

Private

Hidden

100
New cards

Fact-Based Modeling

a family of conceptual modeling languages, originated in the late 1970s.