Conceptual Database Design – Key Concepts and Patterns

3.1 Introduction to Database Design

  • Objective: produce a database's logical and physical schemas from requirements.

  • Design is part of broader computer system design for an organization.

Lifecycle and Phases

  • Feasibility study & requirements: evaluate costs/benefits; plan resources; collect requirements.

  • Analysis: determine what is needed; describe domain and conceptual schema.

  • Design & realization: determine how to obtain what is needed; define architecture; implement code and docs.

  • Verification: ensure program correct, complete, efficient.

  • Maintenance: bug fixes and updates.

  • Software lifecycle (Spiral model): Analysis → Collection of requirements → Design and realization → Verification and maintenance; iterations with versions.

1. Conceptual Database Design – Scope

  • Scenario-based understanding of overall system; DBMS choice (e.g., PostgreSQL) and application load.

  • From Analysis (WHAT) to Design & realization (HOW): leads to Conceptual Schema, Logical Schema, Physical Schema.

The ER Model: Core Constructs

  • A data model is defined by constructs it admits; each construct has pragmatic, syntactic, and semantic meanings.

  • In ER, the schema is graphical; instances provide extensional data.

  • Core constructs: Entities, Attributes, Relationships; Roles; IS-A (inheritance) and Generalizations; Identification and Cardinality constraints; Other constraints.

2. ER Model vs UML

  • ER focuses on data representation; UML class diagrams are similar but include operations.

3. Entities and Instances

  • An entity is a class of objects of interest with autonomous existence and shared properties.

  • An entity has a unique name; represented as a rectangle in diagrams.

  • An entity denotes a set of objects; an instance is a single object; the schema describes the structure (intensional), instances describe actual data (extensional).

  • Abstraction: the conceptual schema models entities, not individual instances.

4. Entities: Semantics and Notation

  • Extensional level: an entity E has an extension: instances(I,E) = {e1,e2,…}.

  • An instance of E is the object itself, not just a value.

5. Entity Attributes

  • Local property of an entity; each instance has a value from a domain D.

  • Domain is usually omitted in diagrams but defined in the data dictionary.

  • Attribute notation: A for entity E, connected via a circle labeled with A's name.

  • Domain and totality: A is a function from instances(I,E) to D; if total, every instance has a value: A:extinstances(I,E)<br>ightarrowDA: ext{instances}(I,E) <br>ightarrow D and oralleextinextinstances(I,E),A(e)extisdefinedinD.orall e \, ext{in } ext{instances}(I,E), \, A(e) ext{ is defined in } D. If an attribute is multi-valued, it is a relation rather than a function.

  • Composite attributes: can be a record type (e.g., address with street, number, postalCode).

  • Attributes may have explicit domains in the data dictionary or diagram.

6. Entity Attribute Semantics

  • At extensional level, A is a total function: A:extinstances(I,E)<br>ightarrowD.A : ext{instances}(I,E) <br>ightarrow D. If domain is multi-valued, A is a relation: multiple values per entity.

7. Relationships

  • A relationship relates two or more entities; degree = number of entities involved.

  • Binary example: Residence (Person, City); Affiliation (Employee, Department).

  • A relationship R with entities E1,..,En has an extensional level: extinstances(I,R)<br>extinstances(I,E1)imesextimesextinstances(I,En)ext{instances}(I,R) <br>ext{instances}(I,E1) imes ext{…} imes ext{instances}(I,En)

  • In binary case: instances(I,R) ⊆ instances(I,E) × instances(I,F).

  • A relationship is a mathematical relation between entities.

8. Relationship Attributes

  • A relationship attribute is a local property of a relationship.

  • For an instance r ∈ ext{instances}(I,R), attribute A(r) ∈ D.

  • Relationship attributes are not part of intrinsic nature of the relationship; they do not distinguish instances by themselves.

  • Semantics: A:extinstances(I,R)<br>ightarrowD.A: ext{instances}(I,R) <br>ightarrow D. and often total: one value per relationship instance.

9. N-ary Relationships

  • N-ary relationships: R relates E1,..,En; extensional level: extinstances(I,R)2˘283extinstances(I,E1)imes<br>extimesextinstances(I,En).ext{instances}(I,R) \u2283 ext{instances}(I,E1) imes <br>ext{…} imes ext{instances}(I,En).

  • Example: Supplier – Department – Product – many-to-many with possibly extra attributes.

10. Roles in Relationships

  • Roles specify how an entity participates in a relationship; required when the same entity participates multiple times or to improve semantics.

  • If role names are omitted, the role defaults to the entity name.

  • Role semantics affect the extensional interpretation of relationships.

11. ISA and Generalization (Inheritance)

  • ISA: a subentity is a subset of a parent entity; subentity instances are also instances of parent.

  • Generalization: grouping several subentities under a parent by a common criterion.

  • Completeness: complete generalization means union of subentities equals parent; not complete means some parent instances are not in any subentity.

  • Disjointness: subentities can be disjoint or overlapping depending on constraints.

  • In ER, inheritance is transitive and can lead to multiple inheritance if a child participates in multiple ISA/generalizations.

  • Example: Person ⊇ Student, Worker; Student and Worker may share instances (overlap) and can have their own properties.

  • Special note: Inheritance applies to both entities and relationships (with rules similar to those for entities).

12. ISA vs Generalization: Key Differences

  • ISA: parent-child class relationship; subentities are subsets of parent.

  • Generalization: broader grouping; can be complete or not; disjointness varies.

  • Interplay with disjointness/completeness affects instance distribution and possible empty classes.

13. Identification Constraints (Keys)

  • An identifier (key) for an entity E is a set of properties (attributes/relationships) that uniquely identify instances of E.

  • Internal identifiers: formed by attributes of E; External identifiers: include attributes and roles of relationships; can involve 1+ roles.

  • Essential vs derived identifiers: essential identifiers cannot be reduced without losing identification power; derived identifiers come from constraints (e.g., 1-1 participation or identifiers shared across relationships).

  • Visual notation: identifiers are shown by lines connecting the components to a dot; for relationships, lines connect to the rhombus.

  • External constraints may define additional identifications (e.g., a supplier’s identifier derived from a relationship).

14. External-ish and Additional Constraints

  • External constraints go beyond the diagram; examples include tenure, salary bounds, or domain-specific rules (e.g., director must supervise within department for >= 5 years).

  • They are not part of the ER diagram itself but are important for data integrity.

15. Modeling Patterns and Transformations

  • Patterns help resolve common modeling problems and improve design quality:

    • T-MVAe: transform a simple multi-valued attribute of an entity into a relationship (and create domain entity).

    • T-CAe: transform a composite attribute into a separate entity linked by a relationship.

    • T-RtoE: transform a relationship into an entity (with sub-cases for ISA/relationships).

    • T-MVAr: transform a multi-valued attribute of a relationship into a relationship (and domain entity).

    • T-Rn: transform a relationship with a mandatory multi-valued attribute into a higher-degree relationship+
      (adds an additional component).

  • These transformations preserve semantic equivalence, but may affect modeling clarity and maintainability.

16. Dynamic Aspects and History

  • Dynamic representation options:

    • Historicized entities: model “state” of an entity via a StateOfE entity connected through a HasState relationship; each StateOfE holds changing properties; changes create new state instances.

    • Alternative: a single StateOfPerson entity that records all changing properties; current values come from the latest state.

  • For events, a SubjectTo pattern can model event-category occurrences (e.g., maintenance) with appropriate date and related properties.

17. Data Dictionary and Documentation

  • ER schemas are complemented by a data dictionary listing:

    • Entities and their identifiers

    • Relationships and their identifiers

    • Attributes and their domains

    • External constraints

  • Documentation aids readability and maintenance.

18. Quality Control in Conceptual Schema

  • Four main quality factors: Correctness, Minimality, Readability, Adequacy.

  • Correctness: model reality accurately and completely.

  • Minimality: avoid unnecessary redundancies; document unavoidable ones.

  • Readability: clear diagrams; prefer explicit constraints over external constraints when possible.

  • Adequacy: alignment with the model and the specifications.

19. Common Pitfalls (from exercises)

  • Attribute vs entity mis-modeling (e.g., categorizing a city or region as an entity without necessity).

  • Missing roles in relationships with multiple participation.

  • Incorrect generalization/inheritance (disjointness/completeness misapplied).

  • Incorrect identifiers (internal vs external) and compliant constraints.

  • Incorrect handling of multi-valued attributes in transformed schemas.

20. Quick Takeaways for Last-Minute Review

  • Distinguish between schema (intensional) and instance (extensional).

  • Understand when to model a concept as an entity vs an attribute vs a relationship.

  • Use ISA and generalization with attention to disjointness and completeness; ensure inheritance propagates constraints.

  • Use identifiers (keys) properly; leverage derived identifiers where appropriate.

  • Apply modeling patterns to simplify schemas while preserving semantics.

  • Document constraints and maintain a data dictionary for clarity.

  • Consider dynamic/historicization when tracking changes over time.

extEntityattribute:A:extinstances(I,E)<br>ightarrowD,exttotal:oralle(eextinextinstances(I,E))<br>ightarrowextexistsdextwithA(e)=d.ext{Entity attribute: } A: ext{instances}(I,E)<br>ightarrow D, ext{ total: } orall e\big(e ext{ in } ext{instances}(I,E)\big)<br>ightarrow ext{exists }d ext{ with }A(e)=d.
ext{Relationship (binary): } R ext{ between }E,F: ext{instances}(I,R)\n ext{is a subset of } ext{instances}(I,E) imes ext{instances}(I,F).}
extNary:extinstances(I,R)<br>riangleleftextinstances(I,E<em>1)imesextimesextinstances(I,E</em>N).ext{N-ary: } ext{instances}(I,R)<br>riangleleft ext{instances}(I,E<em>1) imes ext{…} imes ext{instances}(I,E</em>N).
extCardinalityconstraintonroleU:extForeextinextinstances(I,E),<br>extx(extmincardinality)extcountR(e,U)extmaxcardinality).ext{Cardinality constraint on role }U: ext{ For }e ext{ in } ext{instances}(I,E),<br>ext{ x}\big( ext{min cardinality}\big) \le ext{count}_{R}(e,U) \le ext{max cardinality} \big).