Conceptual Modeling of Data and ER Model

Database Design Process

  • Data Modeling: The database design process starts with data modeling.
  • RDBMS Relations Translator: Transforms the high-level model into RDBMS relations.
  • High-Level Model: A conceptual representation of the database.

Database Design: A 6-Step Program

  1. Requirements Analysis:
    • Data requirements and critical operations are identified.
  2. Conceptual DB Design:
    • A high-level description of data and constraints using the ER model.
  3. Logical DB Design:
    • Conversion into a schema.
    • Selection of a DBMS (Relational DBMS is most popular).
  4. Schema Refinement:
    • Normalization to eliminate redundancy.
  5. Physical DB Design:
    • Consideration of workloads, indexes, and data clustering.
  6. Application/Security Design

Overview of Entity-Relationship (ER) Model

  • A visual model to specify:
    • What information the database must hold.
    • Relationships among components of that information.
  • Proposed by Peter Chen in 1976.
  • Topics Covered:
    • Basic elements (entities, relationships, roles).
    • Constraints.
    • Weak entity sets.
    • Multi-way relationships.
    • Subclass/superclass relationships.
    • Aggregation.

ER Model

  • Entity: A real-world object distinguishable from other objects (e.g., an employee, John Yates).

    • Described using a set of attributes.
  • Entity Set: A collection of similar entities (e.g., all employees).

    • Entities in an entity set have the same set of attributes.
    • Each entity set has a key (underlined).
    • Each attribute has a domain.

    Illustration: An entity set Employees with attributes ssn, name, and salary. ssn is the key.

  • Analogy to Object-Oriented (OO) Terminology:

    • Entity set ≈ Class
    • Entity ≈ Instance

Entity Set - Attributes

  • Single-Valued vs. Multi-Valued:

    • Single-valued Example: salary of an employee.
    • Multi-valued Example: phone number of an employee (can have multiple).
  • Atomic vs. Composite:

    • Atomic Example: salary of a person.
    • Composite Example: address of a person (street, city, etc.).
  • Stored vs. Derived:

    • Derived Attributes: Can be derived from other attributes or entities.
      • Example: age can be derived from date of birth.
    • Stored Attributes: All other attributes that are not derived.

    Illustration: Employees entity set with attributes phone (multi-valued), salary (single-valued), age (derived), number (atomic), and address (composite, with city and street components).

Entity Set: Keys and Domains

  • Key Attributes of an Entity:

    • A key is a set of attributes that uniquely identifies an entity in an entity set.
    • Denoted in ER diagrams by underlining the attributes that form a key.
    • Examples:
      • {ssno} as a key for the Employees entity set.
      • {emp_id} as another key for the Employees entity set.
    • Multiple keys may exist; one is chosen as the primary key (underlined), and others are candidate keys.

    *Illustration: Employees entity set with attributes name, emp_id, and ssn (underlined, indicating it's a key).
    Departments entity set with attributes dname, budget, and dept_id (underlined, indicating it's a key).
    Classroom entity set with attributes building, capacity, and roomNumber (composite key).

  • Domains of Attributes:

    • A domain is associated with each simple attribute.
    • The value of the attribute for the entity is constrained to be within the domain.

ER Model - Relationships

  • Relationship: Association among two or more entities (e.g., John Yates works in the EECS department).

  • Relationship Set: Collection of similar relationships.

    • Relationships may have their own attributes.
    • Can be Binary, Ternary, 4-nary, … relationship sets

    Illustration: Employees entity set related to Departments entity set via the Works_In relationship, with attribute since. Attributes lot, name, ssn for Employees, Works_In has since, dname, budget, did for Departments

ER Model – Relationships (cont.)

  • A binary relationship involves two entities.

  • A ternary relationship involves three entities.

  • An n-ary relationship set R relates n entity sets E1 … En; each relationship in R involves entities e1 ∈ E1, …, en ∈ En.

    Illustrations of Binary (Employees WorksIn Departments) and Ternary (Employees WorksIn Departments Division) Relationship Sets.

ER Model – Relationships (cont.)

  • The same entity set can participate in different relationship sets, or in different “roles” in the same relationship set.

  • The function that an entity plays in a relationship is called its role.

    • Roles are normally not explicitly specified unless the meaning of the relationship needs clarification.
    • Roles are needed when an entity set is related to itself via a relationship.

    *Illustration: Employees supervises other Employees (roles: subordinate, supervisor) through the supervises relationship with attribute since. Employees Works_In Departments includes dname, budget and did

Constraints on Relationship Sets

  • Limit the possible combinations of entities that may participate in the corresponding relationship set.

    1. Key Constraints (or Multiplicity):

      • Example: An employee is allowed to manage more than one department, but each department has at most one manager (one-to-many).
    2. Participation Constraints:

      • Not all employees are managers (partial participation).
      • Each department needs to have a manager (total participation).

      Illustration: EMPLOYEES MANAGES DEPARTMENTS with dates. Partial and total participation.

Key Constraints (Multiplicity)

  • Consider binary relationship set R between entity sets A and B.

    • One-to-One: An entity in A is associated with at most one entity in B, and vice versa.
    • Many-to-One: An entity in A is associated with at most one entity in B, and an entity in B is associated with many entities in A.
    • Many-to-Many: An entity in A is associated with many entities in B, and vice versa.

    Examples: married-to (one-to-one), works-in (many-to-one), customer-accounts (many-to-many).

Key Constraints (Multiplicity)(cont.)

  • Many-to-many: An employee can work in many departments; a department can have many employees.
  • One-to-many: Each department has at most one manager, but one employee can manage multiple departments.
  • Multiplicity in ER diagram is represented by an arrow pointing from “many” to “one.”
  • worksin relationship: arrow pointing from worksin to departments.
  • manages relationship: arrow pointing from manages to department.

Relationship Attribute in a Many to One Relationship

  • In a Many-to-One relationship, relationship attributes can be repositioned to the entity set on the many side.
    Illustrations on moving since between relations between employee, manages, and department

Relationship Attribute in a One to One Relationship

  • In a One-to-one relationship, relationship attributes can be shifted to either of the entity sets.
    Illustrations on moving since between relations between employee, manages, and department

Participation Constraints

  • Total vs. Partial participation.
  • Example: Suppose each professor must teach at least one course. Then, participation of entity set professor in the relationship teaches with the entity set courses is total.
  • Total Participation: Each entity in entity set A is constrained to be related to other entities via relationship R1.
  • Partial Participation: Not every entity in entity set B is constrained to be related to other entities via relationship R1.
  • In ER diagram, total participation represented using a double line between the relationship and the entity set that totally participates in the relationship.
    Illustrations using entity sets A B with relation R1

Example-1

  • Keys: ssn, dID
  • Key constraint: Each department has a single manager.
  • Participation constraints:
    • Each employee works in at least one department.
    • Each department has at least one employee.
    • Each department has a manager.
      Illustrations using entity sets Departments and Employees and relation Manages

Example-2

  • Keys: ssn#, loanid, branchid
  • Key constraint: Each loan belongs to a single branch.
  • Participation constraints:
    • Each customer must be a customer of at least one branch.
    • Each loan must belong to some branch.

Weak Entity Sets

  • Entity sets that do not have sufficient attributes to form a key are called weak entity sets.
  • A weak entity can be identified uniquely only with the primary key of another (owner) entity.
    • Owner entity set and weak entity set must participate in a one-to-many relationship set (one owner, many weak entities).
    • Weak entity set must have total participation in this identifying relationship set.
  • A weak entity set may have a partial key that distinguishes between weak entities related to the same strong entity.
  • key of weak entity set = key of owner entity set(s) + partial key

Weak Entity Sets - Example

  • Transaction is a weak entity set related to accounts (owner) via “log” identifying relationship.
  • trans# is the partial key that distinguishes different transactions on the same account.
  • In ER diagram:
    • Weak entity is identified with double rectangle.
    • The relationships set connecting the weak entity set to the owner entity is depicted by a double diamond.
    • “partial key” is identified with dashed underline.
  • Key of “transaction” = (acctnumber, trans#)
    Illustrations using entity sets customer, custacct, account, and transaction

Weak Entity Sets (cont.)

  • What if we add acctnumber to “transaction”?
    • The acctnumber information in “transaction” will be redundant.
  • What if we add acctnumber to “transaction” and remove relationship “log”?
    • The relationship between “account” and “transaction” becomes implicit in an attribute, which is not desirable.

A Chain of Weak Entity Sets

  • Names of states are unique and form the key.
  • Names of cities are unique only within a state (e.g., 24 Springfields within United States).
  • Names of streets are unique within a city. Multiple cities could have streets with the same name.
  • Example: a weak entity set might itself participate as owner in an identifying relationship with another weak entity set.
    Illustrations using entity sets state, city, street, and relations located_in

A Weak Entity Set with Multiple Owner Entity Sets

  • Reviewers review movies and assign a rating for each movie – thumbs up or thumbs down.
  • Review is a weak entity set whose owner sets correspond to both the movie and the reviewer entity sets.
  • Key for the review entity set = key of movie + key of reviewer
    Illustrations about relationships between movie, reviewer, and review

Weak Entity Sets

  • Caution! Don’t Overuse Weak Entity Sets
    • Beginning database designers often tend to make all entity sets weak, supported by all other entity sets to which they are linked.
    • In reality, we usually create unique IDs for entity sets.
  • Examples include Social Security numbers, automobile VINs, registration numbers, etc.

Multiway Relationships

  • Usually binary relationships (connecting two entity sets) suffice.
  • However, there are some cases where three or more entity sets must be connected by one relationship.
  • Similar to binary relationships, key and participation constraints can be defined over multiway relationships.
    Illustration - Relations between branch, customer, and account

Key Constraint over Ternary Relationships

  • Interpretation:
    • Each customer is associated to a single account and a single branch.

Representing Ternary Relationship Using Binary Relationships

  • A multiway relationship can be converted to a collection of binary, one-to-many relationships.
  • First Attempt: Can the CAB relationship set be represented using the ER diagram on the right?
    • These binary relationships AB, CA, and CB do not correctly capture the information represented by the ternary relationship.
      *Illustration - Attempts to represent the ternary relationship *

Representing Ternary Relationship Using Binary Relationships

  • Introduce a new entity set whose entities we may think of as the tuples of the multiway relationship.
  • The CAB relationship is represented as a weak entity set that depends upon the customer, branch, and account entity sets.
  • This schema using binary relationship fully captures the ternary relationship.
    *Illustration - Attempts to represent the ternary relationship *

Representing Ternary Relationship Using Binary Relationships

  • Previous mapping technique works for many-many-many relationship.
  • How to convert the many-many-1, many-1-1, 1-1-1 ternary relationships into binary relationships?
    • In general, it is always possible to convert any ternary (or multiway relationship) into a collection of binary relationships without losing information!!
    • However, the conversions can be quite complex and resulting unnatural schemas.

Limitations of the Basic ER Model Studied So Far

  • How to represent different account types in the ER model:
  • Possible Approach:
    • Associate an attribute -- account-type with the accounts entity set.
    • Problems:
      • Different attributes may be associated with the account depending on its type:
        • checking: overdraft amount
        • savings: interest rate
      • Depending upon its type, savings and checking accounts may participate in different relationships.

Limitations of the Basic ER Model Studied So Far

  • How to represent this in the ER model:
  • Approach 2:
    • entity sets: checking, savings, and accounts.
    • relationships: 1-1 between checking and accounts, and 1-1 between savings and accounts
    • Problems:
      • Not intuitive: checking and savings are represented as entities different from accounts, even though they are accounts.
      • Redundancy of information: info about accounts represented both in checking / savings as well as account entity set
      • Potential Errors: Same account could be erroneously associated with both checking as well as savings.

Subclass/Superclass Relationships

  • “savings” and “checking” are subclasses of the account entity set.
  • “account” is a superclass of savings and checking entity sets.
  • An entity in a subclass has to belong to the superclass as well
    • Every savings account is also an account
    • Every checking account is also an account
  • Attribute and Relationship Inheritance:
    • As in OO languages, subclasses inherit all the attributes of the superclass. Similarly, subclasses inherit all relationships in which the superclass participates.
  • The key of the entity set corresponding to the subclass is the same as the key for the superclass.
  • ISA relationship is one-to-one, although we don’t draw two arrows on the two sides.

Specialization and Generalization

  • Specialization: process of classifying a class of objects into more specialized subclasses
    • E.g., during design, we begin with an account entity set. We then specialize the account set into different types of accounts.
  • Generalization: Reverse of specialization -- it is a process of synthesis of two or more (lower level) entity sets to produce a higher-level entity set.
    • E.g., during design, we have identified a car, a sports utility vehicle, and a truck. We generalize these classes to create an automobile entity set.

Types of Class/Subclass Relationships

  • Disjoint vs. Overlapping:
    • If the subclasses of the entity set do not overlap, then it is disjoint.
      • default
    • Else, overlapping.
      • denoted by a ‘o’ next to ISA triangle
  • Total vs. Partial (also called “covering”):
    • If each entity in a superclass belongs to at least one of the subclasses, then total.
      • denoted by a double line from superclass to ISA triangle
    • Else, partial

Superclass/Subclass Lattice

  • Class/Subclass relationships might form a hierarchy (tree) or a lattice

Back to Limitations of Basic ER Model

  • Suppose projects are sponsored by one or more departments.
  • A department that sponsors a project might assign an employee to monitor the sponsorship.
    Illustrations of bad designs.

Limitations of Basic ER Model

  • The “monitors” should be a relationship that associates “Sponsorship” to “employees.”
  • Treat the relationship set sponsors and the entity sets projects and departments as a higher-level entity set-- an aggregate entity set.
  • Permit relationships between aggregate entity sets and other entity sets.

Aggregation

  • Without introducing redundancy, the following diagram represents:
    • A project is sponsored by a particular department.
    • A sponsorship (project, department combination) may have an associated monitoring employee.

Review of ER Model

  • Basic Model:
    • Entities: strong, weak
    • Attributes associated with entity sets and relationships
    • Relationships: binary, ternary, …
    • Role of entity sets in a relationship
    • Constraints on the entity set: domain constraints, key constraint
    • Constraint on relationships: key constraints, participation --total vs partial.
  • Extended Model:
    • Notion of superclass and subclass
    • Superclass/subclass relationships: disjoint vs overlapping, total vs partial
    • Notion of aggregation

Schema Design Issues

  • Observation: There may be multiple ER schemas describing the same target database.
  • Decisions that need to be made:
    • whether to use an attribute or entity set to represent an object
    • whether to model a concept as a relationship or an entity set
    • whether to use a ternary relationship or a set of binary ones
    • whether to use a strong entity set or a weak entity set
    • whether using generalization/specializations is appropriate
    • whether using aggregates is appropriate
  • Unfortunately, there are no straightforward answers to these questions.
  • No two design teams will come up with the same design.
  • However, there are some simple design principles that should be followed during ER design.

Entity vs. Attribute Example-1

  • Should “dependent” be an attribute of Employees or an entity (connected to Employees by a relationship)?
  • Depends upon the use we want to make of dependent information, and the semantics of the data:
    • If we have several dependents per employee, “dependent” must be an entity (since attributes cannot be set-valued).
    • If “dependent” has its own attributes, “dependent” must be modeled as an entity (since attribute values are atomic).

Entity vs. Attribute Example-2

  • Works_In4 does not allow an employee to work in a department for more than one period.
  • We want to record several values of the descriptive attributes for each instance of this relationship.
    • Accomplished by introducing new entity set, “Duration.”

Subclass Relationship Example-3

  • Assume budgets are associated with managers, not departments.
  • 1st model: Not all employees are managers, so most employees won’t have budgets.

ER Design Example-1

  • We wish to design a database consistent with the following facts:
    1. A train has a unique number and has an engineer.
    2. Trains are either local trains or express trains, but never both.
    3. Stations are either express stops or local stops, but never both.
    4. A station has a name (assumed unique) and an address.
    5. All local trains stop at all stations.
    6. For each train and each station the train stops at, there is a time.
    7. Express trains stop only at express stations.
      Does not capture all the constraints.

ER Design Example-1 Alternative

  • Better Design. It captures express trains and local stations.

ER Design Example-2

  • Notown Records would like to store information about musicians who perform on its albums (as well as other company data) in a database.
    1. Each musician that records at Notown has an SSN, a name, an address, and a phone number. Poorly paid musicians often share the same address. No phone has more than one address, but musicians that share an address might have different phones.
    2. Each instrument used in songs recorded at Notown has a name (e.g., guitar, synthesizer, flute), a musical key (e.g., C, B-flat, E-flat), and an instrument number which is unique for each instrument name.
    3. Each album recorded on the Notown label has a unique identification number, a title, and a copyright date.
    4. Each song recorded at Notown has a title and an author.
    5. Each musician may play several instruments, and a given instrument may be played by several musicians.
    6. Each album has a number of songs on it, but no song may appear on more than one album. The song titles in an album are assumed to be unique.
    7. Each song is performed by one or more musicians, and a musician may perform a number of songs.
    8. Each album has exactly one musician who acts as its producer. A musician may produce several albums, of course.

Textbook notation

  • Symbols used in ER diagrams

Alternative ER notations

  • Alternative symbols used in ER diagrams