Conceptual Database Design – Key Concepts and Patterns
3.1 Introduction to Database Design
Objective: produce a database's logical and physical schemas from requirements.
Design is part of broader computer system design for an organization.
Lifecycle and Phases
Feasibility study & requirements: evaluate costs/benefits; plan resources; collect requirements.
Analysis: determine what is needed; describe domain and conceptual schema.
Design & realization: determine how to obtain what is needed; define architecture; implement code and docs.
Verification: ensure program correct, complete, efficient.
Maintenance: bug fixes and updates.
Software lifecycle (Spiral model): Analysis → Collection of requirements → Design and realization → Verification and maintenance; iterations with versions.
1. Conceptual Database Design – Scope
Scenario-based understanding of overall system; DBMS choice (e.g., PostgreSQL) and application load.
From Analysis (WHAT) to Design & realization (HOW): leads to Conceptual Schema, Logical Schema, Physical Schema.
The ER Model: Core Constructs
A data model is defined by constructs it admits; each construct has pragmatic, syntactic, and semantic meanings.
In ER, the schema is graphical; instances provide extensional data.
Core constructs: Entities, Attributes, Relationships; Roles; IS-A (inheritance) and Generalizations; Identification and Cardinality constraints; Other constraints.
2. ER Model vs UML
ER focuses on data representation; UML class diagrams are similar but include operations.
3. Entities and Instances
An entity is a class of objects of interest with autonomous existence and shared properties.
An entity has a unique name; represented as a rectangle in diagrams.
An entity denotes a set of objects; an instance is a single object; the schema describes the structure (intensional), instances describe actual data (extensional).
Abstraction: the conceptual schema models entities, not individual instances.
4. Entities: Semantics and Notation
Extensional level: an entity E has an extension: instances(I,E) = {e1,e2,…}.
An instance of E is the object itself, not just a value.
5. Entity Attributes
Local property of an entity; each instance has a value from a domain D.
Domain is usually omitted in diagrams but defined in the data dictionary.
Attribute notation: A for entity E, connected via a circle labeled with A's name.
Domain and totality: A is a function from instances(I,E) to D; if total, every instance has a value: and If an attribute is multi-valued, it is a relation rather than a function.
Composite attributes: can be a record type (e.g., address with street, number, postalCode).
Attributes may have explicit domains in the data dictionary or diagram.
6. Entity Attribute Semantics
At extensional level, A is a total function: If domain is multi-valued, A is a relation: multiple values per entity.
7. Relationships
A relationship relates two or more entities; degree = number of entities involved.
Binary example: Residence (Person, City); Affiliation (Employee, Department).
A relationship R with entities E1,..,En has an extensional level:
In binary case: instances(I,R) ⊆ instances(I,E) × instances(I,F).
A relationship is a mathematical relation between entities.
8. Relationship Attributes
A relationship attribute is a local property of a relationship.
For an instance r ∈ ext{instances}(I,R), attribute A(r) ∈ D.
Relationship attributes are not part of intrinsic nature of the relationship; they do not distinguish instances by themselves.
Semantics: and often total: one value per relationship instance.
9. N-ary Relationships
N-ary relationships: R relates E1,..,En; extensional level:
Example: Supplier – Department – Product – many-to-many with possibly extra attributes.
10. Roles in Relationships
Roles specify how an entity participates in a relationship; required when the same entity participates multiple times or to improve semantics.
If role names are omitted, the role defaults to the entity name.
Role semantics affect the extensional interpretation of relationships.
11. ISA and Generalization (Inheritance)
ISA: a subentity is a subset of a parent entity; subentity instances are also instances of parent.
Generalization: grouping several subentities under a parent by a common criterion.
Completeness: complete generalization means union of subentities equals parent; not complete means some parent instances are not in any subentity.
Disjointness: subentities can be disjoint or overlapping depending on constraints.
In ER, inheritance is transitive and can lead to multiple inheritance if a child participates in multiple ISA/generalizations.
Example: Person ⊇ Student, Worker; Student and Worker may share instances (overlap) and can have their own properties.
Special note: Inheritance applies to both entities and relationships (with rules similar to those for entities).
12. ISA vs Generalization: Key Differences
ISA: parent-child class relationship; subentities are subsets of parent.
Generalization: broader grouping; can be complete or not; disjointness varies.
Interplay with disjointness/completeness affects instance distribution and possible empty classes.
13. Identification Constraints (Keys)
An identifier (key) for an entity E is a set of properties (attributes/relationships) that uniquely identify instances of E.
Internal identifiers: formed by attributes of E; External identifiers: include attributes and roles of relationships; can involve 1+ roles.
Essential vs derived identifiers: essential identifiers cannot be reduced without losing identification power; derived identifiers come from constraints (e.g., 1-1 participation or identifiers shared across relationships).
Visual notation: identifiers are shown by lines connecting the components to a dot; for relationships, lines connect to the rhombus.
External constraints may define additional identifications (e.g., a supplier’s identifier derived from a relationship).
14. External-ish and Additional Constraints
External constraints go beyond the diagram; examples include tenure, salary bounds, or domain-specific rules (e.g., director must supervise within department for >= 5 years).
They are not part of the ER diagram itself but are important for data integrity.
15. Modeling Patterns and Transformations
Patterns help resolve common modeling problems and improve design quality:
T-MVAe: transform a simple multi-valued attribute of an entity into a relationship (and create domain entity).
T-CAe: transform a composite attribute into a separate entity linked by a relationship.
T-RtoE: transform a relationship into an entity (with sub-cases for ISA/relationships).
T-MVAr: transform a multi-valued attribute of a relationship into a relationship (and domain entity).
T-Rn: transform a relationship with a mandatory multi-valued attribute into a higher-degree relationship+
(adds an additional component).
These transformations preserve semantic equivalence, but may affect modeling clarity and maintainability.
16. Dynamic Aspects and History
Dynamic representation options:
Historicized entities: model “state” of an entity via a StateOfE entity connected through a HasState relationship; each StateOfE holds changing properties; changes create new state instances.
Alternative: a single StateOfPerson entity that records all changing properties; current values come from the latest state.
For events, a SubjectTo pattern can model event-category occurrences (e.g., maintenance) with appropriate date and related properties.
17. Data Dictionary and Documentation
ER schemas are complemented by a data dictionary listing:
Entities and their identifiers
Relationships and their identifiers
Attributes and their domains
External constraints
Documentation aids readability and maintenance.
18. Quality Control in Conceptual Schema
Four main quality factors: Correctness, Minimality, Readability, Adequacy.
Correctness: model reality accurately and completely.
Minimality: avoid unnecessary redundancies; document unavoidable ones.
Readability: clear diagrams; prefer explicit constraints over external constraints when possible.
Adequacy: alignment with the model and the specifications.
19. Common Pitfalls (from exercises)
Attribute vs entity mis-modeling (e.g., categorizing a city or region as an entity without necessity).
Missing roles in relationships with multiple participation.
Incorrect generalization/inheritance (disjointness/completeness misapplied).
Incorrect identifiers (internal vs external) and compliant constraints.
Incorrect handling of multi-valued attributes in transformed schemas.
20. Quick Takeaways for Last-Minute Review
Distinguish between schema (intensional) and instance (extensional).
Understand when to model a concept as an entity vs an attribute vs a relationship.
Use ISA and generalization with attention to disjointness and completeness; ensure inheritance propagates constraints.
Use identifiers (keys) properly; leverage derived identifiers where appropriate.
Apply modeling patterns to simplify schemas while preserving semantics.
Document constraints and maintain a data dictionary for clarity.
Consider dynamic/historicization when tracking changes over time.
ext{Relationship (binary): } R ext{ between }E,F: ext{instances}(I,R)\n ext{is a subset of } ext{instances}(I,E) imes ext{instances}(I,F).}