Chapter 2_Data Modeling and Design (Part 2)

0.0(0)
studied byStudied by 1 person
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/84

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

85 Terms

1
New cards

Relational,

Dimensional,

Object-Oriented,

Fact-Based,

Time-Based,

NoSQL

most common schemes used to represent data (6)

2
New cards

Dr. Edward Codd,

1970,

relational theory

First articulated by _____ in _____, _____ provides a systematic way to organize data so that they reflected their meaning.

3
New cards

relational theory

This approach had the additional effect of reducing redundancy in data storage.

4
New cards

Information Engineering(IE),

Integration Definition for Information Modeling (IDEF1X),

Barker Notation,

Chen Notation

several different kinds of notation to express the association between entities in relational modeling (4)

5
New cards

IE syntax

The most common form (relational modeling) is _____, with its familiar tridents or ‘crow’s feet’ to depict cardinality.

6
New cards

dimensional modeling,

General Mills,

Dartmouth College,

1960’s

The concept of _____ started from a joint research project conducted by _____ and _____ in the _____

7
New cards

dimensional modeling

data is structured to optimize the query and analysis of large amounts of data.

8
New cards

operational systems

_____ that support transaction processing are optimized for fast processing of individual transactions.

9
New cards

Overwrite (Type 1),

New Row (Type 2),

New Column (Type 3)

three main types of change

10
New cards

ORC

three main types of change are sometimes known by _____

11
New cards

Overwrite (Type 1)

The new value overwrites the old value in place.

12
New cards

New Row (Type 2)

  • The new values are written in a new row, and the

  • old row is marked as not current.

13
New cards

New Column (Type 3)

  • Multiple instances of a value are listed in columns on the same row, and a new value means writing the values in the series one spot down to make space at the front for the new value.

  • The last value is discarded.

14
New cards

Snowflaking

is the term given to normalizing the flat, single-table, dimensional structure in a star schema into the respective component hierarchical or network structures.

15
New cards

grain

  • stands for the meaning or description of a single row of data in a fact table;

  • this is the most detail any row will have.

16
New cards

Conformed dimensions

  • are built with the entire organization in mind instead of just a particular project;

  • this allows these dimensions to be shared across dimensional models, due to containing consistent terminology and values.

17
New cards

Conformed facts

use standardized definitions of terms across individual marts.

18
New cards

Unified Modeling Language

  • is a graphical language for modeling software.

  • has a variety of notations of which one (the class model) concerns databases.

19
New cards

Unified Modeling Language class model

specifies classes(entity types) and their relationship types

20
New cards

class

has Operations or Methods (also called its “behavior”).

21
New cards

Class behavior

is only loosely connected to business logic because it still needs to be sequenced and timed.

22
New cards

table

In ER terms, the _____ has stored procedures/triggers.

23
New cards

Public,

Internally Visible,

Private

Class Operations can be (3)

24
New cards

Public

Externally visible

25
New cards

Internally Visible

Visible to children Objects

26
New cards

Private

Hidden

27
New cards

Fact-Based Modeling

a family of conceptual modeling languages, originated in the late 1970s.

28
New cards

Fact-based languages

view the world in terms of objects, the facts that relate or characterize those objects, and each role that each object plays in each fact.

29
New cards

Fact-based models

do not use attributes, reducing the need for intuitive or expert judgment by expressing the exact relationships between objects (both entities and values).

30
New cards

Object-Role Modeling (ORM)

is a model-driven engineering approach that starts with typical examples of required information or queries presented in any external formulation familiar to users, and then verbalizes these examples at the conceptual level, in terms of simple facts expressed in a controlled natural language

31
New cards

Fully Communication Oriented Modeling (FCO-IM)

is similar in notation and approach to ORM.

32
New cards

Time-based patterns

are used when data values must be associated in chronological order and with specific time values.

33
New cards

Data Vault

  • is a detail-oriented, time-based, and uniquely linked set of normalized tables that support one or more functional areas of business.

  • It is a hybrid approach, encompassing the best of breed between third normal form and star schema.

  • are designed specifically to meet the needs of enterprise data warehouses.

34
New cards

hubs,

links,

satellites

three types of entities

35
New cards

hub

The Data Vault design is focused around the functional areas of business with the _____ representing the primary key.

36
New cards

links

provide transaction integration between the hubs.

37
New cards

satellites

provide the context of the hub primary key

38
New cards

Anchor Modeling

  • is a technique suited for information that changes overtime in both structure and content.

  • It provides graphical notation used for conceptual modeling similar to traditional data modeling, with extensions for working with temporal data.

39
New cards

anchors,

attributes,

ties,

knots

Anchor Modeling has four basic modeling concepts:

40
New cards

NoSQL

is a name for the category of databases built on non-relational technology.

41
New cards

document,

key-value,

column-oriented,

graph

four main types of NoSQL databases

42
New cards

document

Instead of taking a business subject and breaking it up into multiple relational structures, document databases frequently store the business subject in one structure called a _____

43
New cards

Key-value databases

allow an application to store its data in only two columns (‘key’ and ‘value’), with the feature of storing both simple (e.g., dates, numbers, codes) and complex information (unformatted text, video, music, documents, photos) stored within the ‘value’ column.

44
New cards

column-oriented

  • Out of the four types of NoSQL databases, _____ is closest to the RDBMS.

  • Both have a similar way of looking at data as rows and values.

45
New cards

graph database

is designed for data whose relations are well represented as a set of nodes with an undetermined number of connections between these nodes.

46
New cards

Standards Planning and Requirements Committee (SPARC)

Levels of Detail In 1975, the American National Standards Institute’s _____ published their three-schema approach to database management.

47
New cards

Conceptual,

External,

Internal

Levels of Detail In 1975, the American National SPARC published their three-schema approach to database management.

The three key components were:

48
New cards

Conceptual

  • This embodies the ‘real world’ view of the enterprise being modeled in the database.

  • It represents the current ‘best model’ or ‘way of doing business’ for the enterprise.

49
New cards

External

  • The various users of the database management system operate on subsets of the total enterprise model that are relevant to their particular needs.

  • These subsets are represented as ‘external schemas’.

50
New cards

Internal

  • The ‘machine view’ of the data is described by the internal schema.

  • This schema describes the stored representation of the enterprise’s information

51
New cards

conceptual data model

  • captures the high-level data requirements as a collection of related concepts.

  • It contains only the basic and critical business entities within a given realm and function, with a description of each entity and the relationships between entities.

52
New cards

logical data model

  • A _____ is a detailed representation of data requirements, usually in support of a specific usage context, such as application requirements.

  • are still independent of any technology or specific implementation constraints.

  • often begins as an extension of a conceptual data model.

53
New cards

relational logical data model

the conceptual data model is extended by adding attributes.

54
New cards

dimensional logical data model

is in many cases a fully-attributed perspective of the dimensional conceptual data model

55
New cards

logical relational data model

captures the business rules of a business process

56
New cards

logical dimensional

captures the business questions to determine the health and performance of a business process.

57
New cards

physical data model (PDM)

represents a detailed technical solution, often using the logical data model as a starting point and then adapted to work within a set of hardware, software, and network tools.

58
New cards

physical data model,

structures

  1. Because _____ accommodates technology limitations,

  2. _____ are often combined (denormalized) to improve retrieval performance

59
New cards

Canonical Model

  • A variant of a physical scheme is a _____, used for data in motion between systems.

  • This model describes the structure of data being passed between systems as packets or messages.

60
New cards

Enterprise Service Bus (ESB),

Enterprise Application Integration (EAI)

When sending data through web services, an ______ or through _____, the canonical model describes what data structure the sending service and any receiving services should use.

61
New cards

view

  • is a virtual table

  • provide a means to look at data from one or many tables that contain or reference the actual attributes

  • are used to simplify queries, control data access, and rename columns, without the redundancy and loss of referential integrity due to denormalization.

62
New cards

standard view

runs SQL to retrieve data at the point when an attribute in the view is requested.

63
New cards

instantiated/materialized view

runs at a predetermined time

64
New cards

Partitioning

  • refers to the process of splitting a table

  • It is performed to facilitate archiving and to improve retrieval performance

65
New cards

Vertically split

To reduce query sets, create subset tables that contain subsets of columns

66
New cards

Horizontally split

To reduce query sets, create subset tables using the value of a column as the differentiator.

67
New cards

Denormalization

  • is the deliberate transformation of normalized logical data model entities into physical tables with redundant or duplicate data structures.

  • can also be used to enforce user security by segregating data into multiple views or copies of tables according to access needs.

  • This process does introduce a risk of data errors due to duplication.

68
New cards

Combining data from multiple other tables in advance,

Creating smaller, pre-filtered copies of data,

Pre-calculating and storing costly data calculations

There are several reasons to denormalize data.

  • The first is to improve performance by:

    1. ______ to avoid costly run-time joins

    2. ______ to reduce costly run-time calculations and/or table scans of large tables

    3. ______ to avoid run-time system resource competition

69
New cards

collapsing,

combining

In dimensional data modeling, denormalization is called _____ or _____.

70
New cards

star schema

If each dimension is collapsed into a single structure, the resulting data model is called a _____

71
New cards

snowflake

If the dimensions are not collapsed, the resulting data model is called a _____

72
New cards

normalization

  • is the process of applying rules in order to organize business complexity into stable data structures.

  • The basic goal of _____ is to keep each attribute in only one place to eliminate redundancy and the inconsistencies that can result from redundancy.

73
New cards

First normal form (1NF),

Second normal form (2NF),

Third normal form (3NF),

Boyce/Codd normal form (BCNF),

Fourth normal form (4NF),

Fifth normal form (5NF)

Normalization levels (6)

74
New cards

First normal form (1NF)

  • Ensures each entity has a valid primary key,

  • and every attribute depends on the primary key

  • removes repeating groups, and

  • ensures each attribute is atomic (not multi-valued).

  • includes the resolution of many-to-many relationships with an additional entity often called an associative entity.

75
New cards

Second normal form (2NF)

  • Ensures each entity has the minimal primary key and that

  • every attribute depends on the complete primary key.

76
New cards

Third normal form (3NF)

  • Ensures each entity has no hidden primary keys and that

  • each attribute depends on no attributes outside the key (“the key, the whole key and nothing but the key”).

77
New cards

Fourth normal form (4NF)

Resolves all many-to-many-to-many relationships (and beyond) in pairs until they cannot be broken down into any smaller pieces.

78
New cards

Boyce / Codd normal form (BCNF)

Resolves overlapping composite candidate keys.

79
New cards

candidate key

is either a primary or an alternate key

80
New cards

Composite

means more than one (i.e., two or more attributes in an entity’s primary or alternate keys)

81
New cards

overlapping

means there are hidden business rules between the keys.

82
New cards

Fifth normal form (5NF)

Resolves inter-entity dependencies into basic pairs, and all join dependencies use parts of primary keys.

83
New cards

Abstraction

  • is the removal of details in such a way as to broaden applicability to a wide class of situations while preserving the important properties and essential nature from concepts or subjects.

  • is the concept of exposing only the required essential characteristics and behavior with respect to a context.

84
New cards

Generalization

groups the common attributes and relationships of entities into super type entities

85
New cards

specialization

  • separates distinguishing attributes within an entity into subtype entities.

  • is usually based on attribute values within an entity instance.