S

ER Modeling (Advanced) - L2

Online Class Protocol

  • Mute microphone.

  • Use chat for questions or raise virtual hand.

  • Turn off video if internet is poor.

  • Wait for the lecturer to start.

Lecture Room Protocol

  • If you have any symptoms, even very mild, you MUST NOT stay in class. Go home, self-isolate and seek medical advice immediately.

Acknowledgement of Country

  • UNSW Business School acknowledges the Bidjigal (Kensington campus) and Gadigal (City campus) people, the traditional custodians of the land.

  • Acknowledges Aboriginal and Torres Strait Islander Elders, past and present.

  • Recognizes their ongoing leadership and contributions to business, education, and industry.

Database Design Process

  • Data vs. Information

  • Data stored in databases

  • Database management system (DBMS)

  • Database design defines database structure

  • Big Data

Conceptual Model

  • Entity-Relationship Modelling technique

    • Chen’s notation for high-level conceptual model initially

    • Crow’s Foot as the design standard

  • Entity type and entity instance

  • Attribute and value

  • Relationship (Degree, Connectivity, Cardinality)

Logical Model

  • Converting Conceptual model to detailed Logical Model (using Crow’s Foot) ready for DB implementation

  • Advanced topics

    • Relationship strength

    • Composite entity

    • Relationship degree

    • Supertype and Subtype

    • Selecting Primary Key

Relational Model

  • Apply Normalisation to convert Logical Model to Relational Model

SQL Data Definition & Manipulation Language

  • Use Relational Model to implement database with SQL

  • Data Definition Language defines the tables

  • Data Manipulation Language queries/updates the tables

Relevant Chapters

  • Chapter 4. Entity Relationship (ER) Modeling 4-1 to 4-3

  • Chapter 5 Advanced Data Modeling 5-1 to 5-3

Recap: Connectivity of Relationships

  • Crow’s Foot implies connectivity.

  • Examples:

    • Person married to 0:M persons? (Incorrect)

    • Performer associated with 0:M songs?

    • Song associated with 0:M performers?

  • Can a person marry oneself?

  • How to capture Adele recording the same song in 2 different years?

Fixing Connectivity

  • A person can be married to 0:1 person.

  • A performer can be associated with 1:M songs.

  • A song can be associated with 1:M performers.

  • Can a person marry oneself? (Constraints needed to rule this out)

  • How to capture Adele recording the same song in 2 different years? (Use composite entity)

Relationship Strength

Existence Dependence

  • Weak entity: Cannot exist without another parent entity.

    • E.g., CLASS can exist only with a COURSE.

Existence Independence

  • Strong entity: Can exist apart from parent entities.

    • E.g., CLASS can exist without a COURSE.

Strong (identifying) relationships (solid line)

  • Primary key of the (related or child) entity contains a primary key component of the parent entity

Weak (non-identifying) relationship (dashed line)

  • Primary key of the (related or child) entity does not contain a primary key component of the parent entity

  • CRS_CODE is the primary key of COURSE table

  • CRS_CODE is a foreign key of CLASS table

  • CLASS can be defined in isolation, e.g., a one-time seminar with no reference to a formal catalog of courses

  • Primary key of CLASS is partially or totally derived from the parent entity

Weak Entity Example: EMPLOYEE vs. DEPENDENT

  • Parent entity: EMPLOYEE (may not have any DEPENDENT)

  • Weak entity: DEPENDENT (relies on EMPLOYEE)

  • Weak entity’s primary key (PK) is partially or entirely derived from the parent entity

  • (EMPNUM, DEPNUM) is the PK; part of it (EMP_NUM) is from entity EMPLOYEE

  • A dependent must have ONE employee working in the company. If both parents work in the same company, the child must connect to one only

CLASS and COURSE relationship analysis

  • Option 1: Artificial CLASS_CODE column (auto-incremented integer) as PK

  • Option 2: Composite key (CRSCODE, CLASSSECTION) as PK

  • If CLASS can only exist with a valid COURSE, option 2 might be preferred.

  • Other tables referencing CLASS table (e.g., ENROLMENT) only need to store PK CLASSCODE instead of complex composite PK (CRSCODE, CLASS_SECTION)

  • Simple artificial key is more efficient for indexing and searching

  • PK of CLASS, strictly speaking, does not contain PK of COURSE. Hence the relationship is, by definition, weak non-identifying, and shall be a dashed line. Constraint is needed - CRS_CODE in CLASS table cannot be empty and must be a valid value in COURSE table

Composite Entity

  • Builds a bridge between the original entities

  • Contains attributes that are singular to the relationship between those instances

  • Composed of the PKs of the original entities

  • Existence-dependent on the original entities

  • May contain additional attributes

Objective of Composite Entity

  • M:N relationships (many-to-many relationships) should be avoided and resolved.

  • Relational databases can only handle 1:M relationships (one-to-many relationships).

  • M:N relationships should be decomposed (broken down) to 1:M relationships by creating a composite entity.

  • A composite entity (bridge entity, associative entity) is an entity type that associates the instances of one or more entity types. It contains attributes that are peculiar (singular) to the relationship between those entity instances.

Modeling M:N Relationship

  • M:N relationships between STUDENT and CLASS cannot be implemented by Relational databases

  • Need to break down the relationship to several 1:M relationships

First attempt to resolve M:N relationship between STUDENT and CLASS

  • What are the issues?

    • Must manipulate text data of CLASS_CODE

      • Difficult to retrieve a student’s class codes in a comma-separated list

      • Difficult to remove or add a class for a student

      • Error-prone to update a student’s class codes

      • Filtering all students of a CLASS requiring text searching; hard to track the total student count in each CLASS

      • If a class code changes, updating it for all students is inefficient and risks data inconsistency

    • Not possible to track historical enrollments (e.g., which term the student took a specific class)

    • Difficult to generate reports or analyze class enrollments by course, term, or student

Correct approach to resolve M:N relationship between STUDENT and CLASS

  • The correct approach is to model ENROLMENT as a composite entity

  • How is this better?

    • Easy to retrieve all class enrolments for a student

    • Each enrolment is stored as a separate row, so modifying enrolment of a student does not affect other records

    • Reduce data redundancy of CLASS_CODE in STUDENT table.

    • Easy to track class capacity by filtering for CLASS_CODE in ENROLMENT table.

    • BONUS: a TERM column can be added to ENROLMENT table to track historical data

Example of a composite entity ENROLMENT

  • What is the PK of ENROLMENT?

    • STUNUM and CLASSCODE as a composite key

  • Where do we get values for PK of ENROLMENT table?

    • STUNUM and CLASSCODE values come from other tables

  • Any additional column added to this relationship between STUDENT and CLASS?

    • ENROLL_GRADE

Relationship Degree

  • Unary / Recursive relationship: association is maintained within a single entity

  • Binary relationship: two entities are associated

  • Ternary relationship: three entities are associated

  • Here a composite entity is created to capture the ternary relationship

Strong vs. Weak Relationship

  • Does PK of PRESCRIPTION contain DOCTORID, PATIENTID, DRUG_ID (all are Foreign keys)? Yes

  • PK of EMPLOYEE is EMP_ID

  • Does PK of EMPLOYEE contain MANAGER_ID (FK)? No

  • Must CLASS exist with a PROFESSOR? Yes

  • Since this is a strong relationship, the PK of CLASS might be (PROFID, CLASSNUM), derived from PK of PROFESSOR.

Composite entity to model marriage

  • Technically the PERSON table shall be on both ends of the relationship line as a parent on each side

  • To remove redundancy, one entity, PERSON, acts as both parents of MARRIAGE entity. And two relationship lines are drawn in parallel.

  • ID is put into the bridge table, MARRIAGE, twice. For unary relationship implementations, at least one copy of the attribute must be renamed to avoid duplicate names. In this case, one copy of ID was renamed to SPOUSE_ID.

  • A person can be linked to 0 or 1 ongoing marriage

  • A marriage can be linked to 1 and only 1 person, on each side of the relationship

  • Constraint: ID cannot be equal to SPOUSE_ID

Composite entity to model performer vs. song

  • RECORDING is the composite entity with strong relationships on both sides

  • RECORDING uses PERFORMERID (FK) and SONGID (FK) as a composite primary key

  • 1 PERFORMER can be associated with 1 or many RECORDING

  • 1 RECORDING can be associated with one and exactly one PERFORMER (assuming simplified scenario of no collaboration)

  • 1 SONG can be associated with 1 or many RECORDING (recorded many times with different RELEASE_DATE)

  • 1 RECORDING can be associated with 1 and only one song

Supertype and Subtype

  • How to model airline employees if all employee attributes are the same except:

    • Some employees have a pilot license

    • Some have a mechanic certificate

    • Some have CPA qualification as an accountant

  • Supertype: more generic entity type

  • Subtype: more specific entity type compared to its supertype

    • A subtype entity inherits all attributes of the supertype

    • A subtype has additional, specific attributes

    • An instance of a subtype is also an instance of the supertype

      • A pilot, mechanic, or accountant “is an” employee

    • An instance of the supertype may or may not be an instance of one or more subtypes

      • An employee may not be a pilot

Processes of Generalisation and Specialisation create a Specialisation Hierarchy

Specialisation Hierarchy
  • Depicts arrangement of higher-level supertypes and lower-level subtypes

  • Relationships are described in terms of “is-a” relationships

  • Subtype exists within the context of a supertype; every subtype must have one supertype to be directly related to

  • Supertype can have many subtypes

Generalisation Process
  • Defining a general entity type from a set of specialised entity types

  • bottom-up process from subtypes to supertypes

Specialisation Process
  • Defining one or more subtypes of the supertype

  • top-down process from supertypes to subtypes

In the airline example, how do we find all attributes common to all employees (generalisation), and attributes that are different among employees (specialisation)?

What can Specialisation Hierarchy do to help our database design later?

Support inheritance
  • All subtypes inherit primary key attribute from the supertype

  • Subtype inherits attributes from the supertype

  • Subtypes inherit all relationships in which supertype participates

  • At the implementation level, supertype and its subtypes maintain a 1:1 relationship

Define subtype discriminator
  • Attribute in the supertype determining which subtype the supertype instance is related to

  • E.g., EMP_TYPE (can be pilot, mechanic or accountant)

  • Default comparison condition is the equality comparison

Define disjoint/overlapping constraints
  • Can an employee be a pilot and a mechanic at the same time?

Define complete/partial constraints
  • Must an employee be one of the three (pilot, mechanic or accountant)?

Disjoint and Overlapping Constraints

Disjoint (non-overlapping) subtypes
  • Contain a unique subset of the supertype entity set

  • The value of the subtype discriminator attribute in the supertype cannot be more than 1 value at the same time

  • E.g., VOLUNTEER has MemType = “V”; STAFF has MemType = “S”;

Overlapping subtypes
  • Contain non-unique subsets of the supertype entity set

  • Implementation requires the use of one discriminator attribute for each subtype

  • E.g., discriminator StoryIsMovie is required for MOVIE subtype; discriminator StoryIsBook is required for BOOK subtype

  • StoryIsMovie and StoryIsBook can both be TRUE

Partial/total completeness show different behaviours with disjoint/overlapping constraints

Disjoint Constraint
  • Partial

    • Supertype has optional subtypes.

    • Subtype discriminator can be null (the instance is not of any designed subtype)

    • Subtype sets are unique.

  • Total

    • Every supertype instance must be a member of only one subtype.

    • Subtype discriminator canNOT be null.

    • Subtype sets are unique.

Overlapping Constraint
  • Partial

    • Supertype has optional subtypes.

    • Subtype discriminators can be null (the instance is not of any designed subtype)

    • Subtype sets are NOT unique.

  • Total

    • Every supertype instance is a member of at least one subtype.

    • Subtype discriminators canNOT be null.

    • Subtype sets are NOT unique.

Example: Specialisation Hierarchy with disjoint and partial constraints

  • ‘d’ means disjoint, e.g., an employee cannot a pilot and a mechanic concurrently

  • One line indicates partial constraint, i.e., there are other employee types beyond P, M, A

  • Pilot, Mechanic, Accountant all have these attributes via inheritance, EMP_LNAME…

  • P, M, A subtypes can have dependents, just like supertype EMPLOYEE

  • Only Accountant has ACTCPADATE, while other subtypes do not

  • The value of EMP_TYPE can be “P”, “M”, “A”

Specialisation Hierarchy

  • Overlapping: a student can also be an employee

  • Total completeness: Everyone must be an employee, a student or both
    *Partial completeness: ADMIN and PROF do not form the whole picture

  • Total completeness: Everyone must be either a Graduate or Undergraduate

  • Overlapping: an admin can also be a professor
    *Disjoint: student cannot be graduate and undergrad at the same time

Selecting Primary Key

Guideline for selecting Primary Key

  • Must be unique: Ensures each record is uniquely identifiable. StudentID in STUDENT table.

  • Cannot be NULL: Every record must have a primary key value. EmployeeID must have a value for each entry.

  • Non-intelligent: Should not carry embedded meaning; avoid semantic data. Use zID instead of Name.

  • No change over time: Should remain constant; avoid attributes that are likely to change. Avoid EmployeeName, which could change due to various reasons.

  • Preferably single-attribute: Simplifies database design and improves performance. Use OrderID instead of a composite key (CustomerID, OrderDateTime)

  • Preferably numeric: Easier to input, avoids errors, and supports auto-incrementing. Use 12345 for CustomerID, instead of “ABCD”

  • Artificially created by org: Identifier should be generated internally rather than using external or personal data. UNSW's zID for students

  • Security-compliant: Avoids sensitive data that could breach privacy regulations. Use generated StudentID instead of Driver’s licence

Natural keys

Natural key

  • Real-world, generally accepted identifier for real-world objects

  • Familiar to end users and in day-to-day business vocabulary

  • Most natural keys are acceptable as primary key

  • Natural keys would make good PK, but what if there is no natural key?

Case 1: the entity is a strong independent entity but just lacks a natural key, e.g. UNSW student
  • Using student’s passport or driver’s license as PK is a breach of data privacy

  • Primary key UNSW zID artificially created

Case 2: A weak entity tied to a parent entity, e.g., CLASS has a strong identifying relationship with the parent entity COURSE
  • Using composite key with FK CRSCODE together with a new attribute CLASSSECTION

Case 3: A composite entity with identifiers from the reference entities, e.g., ENROLMENT linking STUDENT and CLASS
  • Use composite PK (CLASSCODE, STUNUM), both being FK

  • Each PK combination is allowed only once in the M:N relationship ENROLMENT

Revisiting Composite PK vs. Surrogate PK

Case 2: A weak entity tied to a parent entity, e.g., CLASS has a strong identifying relationship with the parent entity COURSE

  • Using Composite PK (CRSCODE, CLASSSECTION)

    • Cons: lengthy PK already; even longer PK if inherited by another weak entity, e.g., ENROLMENT entity would have (CRSCODE, CLASSSECTION, STU_NUM) as PK

  • Create a Surrogate PK CLASS_CODE

    • Pros: short PK

    • Caveat: we must create a unique constraint on CRSCODE and CLASSSECTION, with no duplicate combinations

Case 3: A composite entity with identifiers from the reference entities, e.g., ENROLMENT linking STUDENT and CLASS

  • Use Composite PK (CLASSCODE, STUNUM), both being FK

    • Each PK combination is allowed only once in the M:N relationship

If ENROLMENT will be parent entity of more weak entities, we may want to create a Surrogate PK ENROLL_CODE to have a short PK in ENROLMENT table
  • Caveat: we must create a unique constraint on CLASSSCODE and STUNUM, with no duplicate combinations

Recap

  • Strong and weak entity

  • Composite entity

  • Supertype and Subtype

  • Convert Logical Model to Relational Model

Next lecture

  • Relational Modelling and SQL I