Module 6 - Data Modeling

Data Modeling Overview

Definition:
- A technique for modeling data determining what data and relationships should be stored in a database.
- Graphical representation of a database, effectively communicating a database design.
Goals:
- Identify the facts to be stored in the database.
- Requires collaboration between client and analyst.
Process:
- Iterative, involving trial and revision.
- The data model serves as a working document.

Building Blocks of Data Modeling

Entity:
- A thing about which data is stored; the basic building block.
Attribute:
- Describes an entity; singular and unique within the model.
Relationship:
- Describes the linkage between two entities, defined by relationship descriptors:
  - 1:1 (one-to-one)
  - 1:m (one-to-many)
  - m:m (many-to-many)
Identifier:
- Uniquely distinguishes an instance of an entity.

Quality of Data Models

Well-formed Data Model:
- All construction rules are obeyed.
- No ambiguity is present.
- All entities, attributes, and relationships are defined.
- Names are meaningful and understood by the client.
High Fidelity Image:
- Accurately describes the real-world it represents.
- Relationships are correctly established.
- Completeness and understandability are key.
- The model must make sense to the client.

Quality Improvement Considerations

Assess:
- Level of detail of the model.
- All exceptions must be handled.
- Overall accuracy of the model.

Modality & Cardinality

Modality:
- Also known as optionality; defines the minimum number of instances in a relationship.
Cardinality:
- Indicates the range of instances in a relationship.
- Links with modality.
Cardinality Examples:
- 0,1: Optional (zero or one instances)
- 0,n: Zero or many instances
- 1,1: Mandatory (exactly one instance)

Types of Entities

Independent:
- Can stand alone; often prominent in the client's mind.
Dependent:
- Relies on another entity for existence and identification.
- Can become independent with an arbitrary identifier.
Associative:
- Results from m:m relationships; holds current or historical data.
Aggregate:
- Formed from multiple entities with a common prefix or suffix.
Subordinate:
- Entity data that varies among instances.

Generalization

Describes a relationship between more general and more specific elements.
- For each subtype, the primary key must represent the supertype's key as a foreign key.

UML Aggregation Concepts

Aggregation:
- A part-whole relationship.
- Shared Aggregation:
  - Enables multiple entities to own the same entity.
- Composite Aggregation:
  - One entity exclusively owns the other.

Hints on Data Modeling

The model may expand and contract; include identifiers when necessary.
Ensure identifiers have a singular purpose—identification.
Create attributes when instance ordering is required.
Choose names carefully; be aware of synonyms and homonyms.
Maintain clarity by labeling relationships and ensuring data model accuracy.

Meaningful vs Non-Meaningful Identifiers

Meaningful Identifiers:
- Draw inferences about entity attributes; can be recognizable.
- Advantages: Easy to remember;
- Disadvantages: Can lead to identifier exhaustion as realities change.
Non-Meaningful Identifiers:
- Serve the sole purpose of uniquely identifying the entity; avoid data management issues.
- Attributes describe characteristics of the entity.

Key Takeaways

A high-fidelity data model handles all exceptions.
Identifiers only need to distinguish an instance.
Data modeling skills develop over time.