Module 6 - Data Modeling
Data Modeling Overview
Definition:
A technique for modeling data determining what data and relationships should be stored in a database.
Graphical representation of a database, effectively communicating a database design.
Goals:
Identify the facts to be stored in the database.
Requires collaboration between client and analyst.
Process:
Iterative, involving trial and revision.
The data model serves as a working document.
Building Blocks of Data Modeling
Entity:
A thing about which data is stored; the basic building block.
Attribute:
Describes an entity; singular and unique within the model.
Relationship:
Describes the linkage between two entities, defined by relationship descriptors:
1:1 (one-to-one)
1:m (one-to-many)
m:m (many-to-many)
Identifier:
Uniquely distinguishes an instance of an entity.
Quality of Data Models
Well-formed Data Model:
All construction rules are obeyed.
No ambiguity is present.
All entities, attributes, and relationships are defined.
Names are meaningful and understood by the client.
High Fidelity Image:
Accurately describes the real-world it represents.
Relationships are correctly established.
Completeness and understandability are key.
The model must make sense to the client.
Quality Improvement Considerations
Assess:
Level of detail of the model.
All exceptions must be handled.
Overall accuracy of the model.
Modality & Cardinality
Modality:
Also known as optionality; defines the minimum number of instances in a relationship.
Cardinality:
Indicates the range of instances in a relationship.
Links with modality.
Cardinality Examples:
0,1: Optional (zero or one instances)
0,n: Zero or many instances
1,1: Mandatory (exactly one instance)
Types of Entities
Independent:
Can stand alone; often prominent in the client's mind.
Dependent:
Relies on another entity for existence and identification.
Can become independent with an arbitrary identifier.
Associative:
Results from m:m relationships; holds current or historical data.
Aggregate:
Formed from multiple entities with a common prefix or suffix.
Subordinate:
Entity data that varies among instances.
Generalization
Describes a relationship between more general and more specific elements.
For each subtype, the primary key must represent the supertype's key as a foreign key.
UML Aggregation Concepts
Aggregation:
A part-whole relationship.
Shared Aggregation:
Enables multiple entities to own the same entity.
Composite Aggregation:
One entity exclusively owns the other.
Hints on Data Modeling
The model may expand and contract; include identifiers when necessary.
Ensure identifiers have a singular purpose—identification.
Create attributes when instance ordering is required.
Choose names carefully; be aware of synonyms and homonyms.
Maintain clarity by labeling relationships and ensuring data model accuracy.
Meaningful vs Non-Meaningful Identifiers
Meaningful Identifiers:
Draw inferences about entity attributes; can be recognizable.
Advantages: Easy to remember;
Disadvantages: Can lead to identifier exhaustion as realities change.
Non-Meaningful Identifiers:
Serve the sole purpose of uniquely identifying the entity; avoid data management issues.
Attributes describe characteristics of the entity.
Key Takeaways
A high-fidelity data model handles all exceptions.
Identifiers only need to distinguish an instance.
Data modeling skills develop over time.