1/58
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Diagram,
Definitions,
Issues and outstanding questions,
Lineage
Plan for Data Modelling (4)
Diagram
A data model contains one or more _____.
is the visual that captures the requirements in a precise form
Definitions
_____ for entities, attributes, and relationships are essential to maintaining the precision on a data model.
Issues and outstanding questions
Frequently the data modeling process raises issues and questions that may not be addressed during the data modeling phase.
Lineage
For physical and sometimes logical data models, it is important to know the data _____, that is, where the data comes from.
data modeler,
accuracy
Two reasons why lineage is important to capture during the data modeling.
the _____ will obtain a very strong understanding of the data requirements and therefore is in the best position to determine the source attributes.
determining the source attributes can be an effective tool to validate the _____ of the model and the mapping (i.e., a reality check)
analysis and modeling work
To build the models, modelers often rely heavily on previous ___.
They may study existing data models and databases, refer to published standards, and incorporate any data requirements.
Forward engineering
is the process of building a new application beginning with the requirements
CDM
is completed first to understand the scope of the initiative and the key terminology within that scope
Conceptual Data Modelling
CDM stands for _____
LDM
Then the ___ is completed to document the business solution
Logical Data Modelling
LDM stands for _____
PDM
followed by the _____ to document the technical solution
Select Scheme,
Select Notation,
Complete Initial CDM,
Incorporate Enterprise Terminology,
Obtain Sign-off
Creating the CDM involves the following steps: (5)
Complete Initial CDM
Collect the highest-level concepts (nouns) that exist for the organization.
Then collect the activities (verbs) that connect these concepts.
Relationships can go both ways, or involve more than two concepts
Analyze Information Requirements,
Analyze Existing Documentation,
Add Associative Entities,
Add Attributes,
Assign Domains,
Assign Keys
Logical Modelling consists of the following (6)
Analyze Information Requirements
Business processes require information products that are themselves the output from other business processes.
The names of these information products often identify an essential business vocabulary that serves as the basis for data modeling.
Analyze Existing Documentation
It can often be a great jump-start to use pre-existing data artifacts, including already built data models and databases
Associative entities
are used to describe Many-to-Many (or Many-to-Many-to-Many, etc.) relationships.
take the identifying attributes from the entities involved in the relationship and puts them into a new entity that just describes the relationship between the entities.
Add Attributes
An attribute in a logical data model should be atomic.
It should contain one and only one piece of data (fact) that cannot be divided into smaller pieces.
atomic
An attribute in a logical data model should be _____.
It should contain one and only one piece of data (fact) that cannot be divided into smaller pieces.
Assign Domains
Allow for consistency in format and value sets within and across projects.
key attribute
helps identify one unique entity instance from all others, either fully (by itself) or partially (in combination with other key elements)
Non-key attributes
describe the entity instance but do not help uniquely identify it.
Subtype absorption,
Supertype partition
Logical abstraction entities (supertypes and subtypes) become separate objects in the physical database design using one of two methods.
Add Attribute Details
Add details to the physical model, such as the
technical name of each table and column (relational databases),
or file and field (non-relational databases),
or schema and element (XML databases).
Add Reference Data Objects
Small reference data value sets in the logical data model can be implemented in a physical model in three common ways:
Create a matching separate code table
Create a master shared code table
Embed rules or valid codes into the appropriate object’s definition
Assign Surrogate Keys
Assign unique key values that are not visible to the business and have no meaning or relationship with the data with which they are matched.
This is an optional step and depends primarily on whether the natural key is large and composite and whose attributes are assigned values that could change over time.
Denormalizing or adding redundancy
can improve performance so much that it outweighs the cost of the duplicate storage and synchronization processing.
index
is an alternate path for accessing data in the database to optimize query (data retrieval) performance
Partition for Performance
Great consideration must be given to the partitioning strategy of the overall data model (dimensional), especially when facts contain many optional dimensional keys (sparse).
Views
can be used to control access to certain data elements or to embed common join conditions or filters to standardize common objects or queries.
Reverse engineering
is the process of documenting an existing database.
PDM
is completed first to understand the technical design of an existing system
LDM
to document the business solution that the existing system meets
CDM
to document the scope and key terminology within the existing system.
Review the Data Models
Models require quality control.
Continuous improvement practices should be employed.
Techniques such as time-to-value, support costs, and data model quality validators such as the Data Model Scorecard® (Hoberman, 2009) can all be used to evaluate the model for correctness, completeness, and consistency.
Maintain the Data Models
Updates to the data model need to be made when requirements change and frequently when business processes change.
Within a specific project, often when one model level needs to change, a corresponding higher level of model needs to change.
Many data modeling tools help automate this process of comparing physical with logical.
Data modeling tools
are software that automate many of the tasks the data modeler performs.
Entry-level data modeling tools
provide basic drawing functionality, including a data modeling palette, so that the user can easily create entities and relationships.
database data definition language (DDL).
More sophisticated data modeling tools support forward engineering from conceptual to logical to physical to database structures, allowing the generation of _____
Lineage Tools
is software that allows the capture and maintenance of the source structures for each attribute on the data model.
These tools enable impact analysis; that is, one can use them to see if a change in one system or part of system has effects in another system.
Data Profiling Tool
can help explore the data content, validate it against existing metadata, and identify data quality gaps/deficiencies, as well as deficiencies in existing data artifacts, such as logical and physical models, DDL, and model descriptions.
Metadata Repository
is a software tool that stores descriptive information about the data model, including the diagram and accompanying text such as definitions, along with Metadata imported other tools and processes
Data Model Patterns
are reusable modeling structures that can be applied to a wide class of situations.
There are elementary, assembly, and integration data model patterns.
Elementary patterns are the ‘nuts and bolts’ of data modeling.
Industry data models
are data models pre-built for an entire industry, such as healthcare, telecom, insurance, banking, or manufacturing.
These models are often both broad in scope and very detailed.
Some contain thousands of entities and attributes.
can be purchased through vendors or obtained through industry groups such as ARTS (for retail), SID (for communications), or ACORD (for insurance).
ISO 11179 Metadata Registry
an international standard for representing Metadata in an organization
contains several sections related to data standards, including naming attributes and writing definitions.
Best Practices in Naming Conventions
Publish data model and database naming standards for each type of modeling object and database object.
Naming standards are particularly important for entities, tables, attributes, keys, views, and indexes.
Names should be unique and as descriptive as possible.
Naming standards should minimize name changes across environments.
Names should not reflect specific environment
Performance and ease of use,
Reusability,
Integrity,
Security,
Maintainability
Best Practices in Database Design (5)
Performance and ease of use
Ensure quick and easy access to data by approved users in a usable and business-relevant form, maximizing the business value of both applications and data.
Reusability
The database structure should ensure that, where appropriate, multiple applications can use the data and that the data can serve multiple purposes.
Avoid coupling a database, data structure, or data object to a single application.
Integrity
The data should always have a valid business meaning and value, regardless of context, and should always reflect a valid state of the business.
Security
True and accurate data should always be immediately available to authorized users, but only to authorized users
Maintainability
Perform all data work at a cost that yields value by ensuring that the cost of creating, storing, maintaining, using, and disposing of data does not exceed its value to the organization
Data Model and Design Quality Management
Data analysts and designers act as intermediaries between information consumers (the people with business requirements for data) and the data producers who capture the data in usable form.
Data professionals must balance the data requirements of the information consumers and the application requirements of data producers.
Data models and database designs should be a reasonable balance between the short-term needs the long-term needs of the enterprise
standard data modeling and database design deliverables,
standard names, acceptable abbreviations, and abbreviation rules for uncommon words,
standard naming formats,
standard methods,
metadata properties captured in data modeling and database design,
expectations and requirements,
data modeling tools,
preparing for and leading design reviews,
versioning,
discouraged
Data modeling and database design standards should include the following: (11)
A list and description of _____
A list of _____ that apply to all data model objects
A list of ______ for all data model objects, including attribute and column words
A list and description of _____ for creating and maintaining these deliverables
A list description of data modeling and database design roles and responsibilities
A list and description of all _____, including both business metadata and technical metadata.
Metadata quality _____
Guidelines for how to use _____
Guidelines for _____
Guidelines for _____ of data models
Practices that are _____
Review Data Model and Database Design Quality
Project teams should conduct requirements reviews and design reviews of the conceptual data model, logical data model, and physical database design.
The agenda for review meetings should include items for reviewing the starting model (if any), the changes made to the model and any other options that were considered and rejected, and how well the new model conforms to any modeling or architecture standards in place.
Conduct design reviews with a group of subject matter experts representing different backgrounds, skills, expectations, and opinions
In reviews where there is no approval, the modeler must rework the design to resolve the issues.
Manage Data Model Versioning and Integration
Data models and other design specifications require careful change control, just like requirements specifications and other SDLC deliverables.
Each change should note:
Why the project or situation required the change
What and How the object(s) changed, including which tables had columns added, modified, or removed, etc.
When the change was approved and when the change was made to the model
Who made the change
Where the change was made
Data Model Scorecard
provides 11 data model quality metrics: one for each often categories that make up the Scorecard and an overall score across all ten categories