Mute microphone.
Use chat for questions or raise virtual hand.
Turn off video if internet is poor.
Wait for the lecturer to start.
If you have any symptoms, even very mild, you MUST NOT stay in class. Go home, self-isolate and seek medical advice immediately.
UNSW Business School acknowledges the Bidjigal (Kensington campus) and Gadigal (City campus) people, the traditional custodians of the land.
Acknowledges Aboriginal and Torres Strait Islander Elders, past and present.
Recognizes their ongoing leadership and contributions to business, education, and industry.
Data vs. Information
Data stored in databases
Database management system (DBMS)
Database design defines database structure
Big Data
Entity-Relationship Modelling technique
Chen’s notation for high-level conceptual model initially
Crow’s Foot as the design standard
Entity type and entity instance
Attribute and value
Relationship (Degree, Connectivity, Cardinality)
Converting Conceptual model to detailed Logical Model (using Crow’s Foot) ready for DB implementation
Advanced topics
Relationship strength
Composite entity
Relationship degree
Supertype and Subtype
Selecting Primary Key
Apply Normalisation to convert Logical Model to Relational Model
Use Relational Model to implement database with SQL
Data Definition Language defines the tables
Data Manipulation Language queries/updates the tables
Chapter 4. Entity Relationship (ER) Modeling 4-1 to 4-3
Chapter 5 Advanced Data Modeling 5-1 to 5-3
Crow’s Foot implies connectivity.
Examples:
Person married to 0:M persons? (Incorrect)
Performer associated with 0:M songs?
Song associated with 0:M performers?
Can a person marry oneself?
How to capture Adele recording the same song in 2 different years?
A person can be married to 0:1 person.
A performer can be associated with 1:M songs.
A song can be associated with 1:M performers.
Can a person marry oneself? (Constraints needed to rule this out)
How to capture Adele recording the same song in 2 different years? (Use composite entity)
Weak entity: Cannot exist without another parent entity.
E.g., CLASS can exist only with a COURSE.
Strong entity: Can exist apart from parent entities.
E.g., CLASS can exist without a COURSE.
Primary key of the (related or child) entity contains a primary key component of the parent entity
Primary key of the (related or child) entity does not contain a primary key component of the parent entity
CRS_CODE is the primary key of COURSE table
CRS_CODE is a foreign key of CLASS table
CLASS can be defined in isolation, e.g., a one-time seminar with no reference to a formal catalog of courses
Primary key of CLASS is partially or totally derived from the parent entity
Parent entity: EMPLOYEE (may not have any DEPENDENT)
Weak entity: DEPENDENT (relies on EMPLOYEE)
Weak entity’s primary key (PK) is partially or entirely derived from the parent entity
(EMPNUM, DEPNUM) is the PK; part of it (EMP_NUM) is from entity EMPLOYEE
A dependent must have ONE employee working in the company. If both parents work in the same company, the child must connect to one only
Option 1: Artificial CLASS_CODE column (auto-incremented integer) as PK
Option 2: Composite key (CRSCODE, CLASSSECTION) as PK
If CLASS can only exist with a valid COURSE, option 2 might be preferred.
Other tables referencing CLASS table (e.g., ENROLMENT) only need to store PK CLASSCODE instead of complex composite PK (CRSCODE, CLASS_SECTION)
Simple artificial key is more efficient for indexing and searching
PK of CLASS, strictly speaking, does not contain PK of COURSE. Hence the relationship is, by definition, weak non-identifying, and shall be a dashed line. Constraint is needed - CRS_CODE in CLASS table cannot be empty and must be a valid value in COURSE table
Builds a bridge between the original entities
Contains attributes that are singular to the relationship between those instances
Composed of the PKs of the original entities
Existence-dependent on the original entities
May contain additional attributes
M:N relationships (many-to-many relationships) should be avoided and resolved.
Relational databases can only handle 1:M relationships (one-to-many relationships).
M:N relationships should be decomposed (broken down) to 1:M relationships by creating a composite entity.
A composite entity (bridge entity, associative entity) is an entity type that associates the instances of one or more entity types. It contains attributes that are peculiar (singular) to the relationship between those entity instances.
M:N relationships between STUDENT and CLASS cannot be implemented by Relational databases
Need to break down the relationship to several 1:M relationships
What are the issues?
Must manipulate text data of CLASS_CODE
Difficult to retrieve a student’s class codes in a comma-separated list
Difficult to remove or add a class for a student
Error-prone to update a student’s class codes
Filtering all students of a CLASS requiring text searching; hard to track the total student count in each CLASS
If a class code changes, updating it for all students is inefficient and risks data inconsistency
Not possible to track historical enrollments (e.g., which term the student took a specific class)
Difficult to generate reports or analyze class enrollments by course, term, or student
The correct approach is to model ENROLMENT as a composite entity
How is this better?
Easy to retrieve all class enrolments for a student
Each enrolment is stored as a separate row, so modifying enrolment of a student does not affect other records
Reduce data redundancy of CLASS_CODE in STUDENT table.
Easy to track class capacity by filtering for CLASS_CODE in ENROLMENT table.
BONUS: a TERM column can be added to ENROLMENT table to track historical data
What is the PK of ENROLMENT?
STUNUM and CLASSCODE as a composite key
Where do we get values for PK of ENROLMENT table?
STUNUM and CLASSCODE values come from other tables
Any additional column added to this relationship between STUDENT and CLASS?
ENROLL_GRADE
Unary / Recursive relationship: association is maintained within a single entity
Binary relationship: two entities are associated
Ternary relationship: three entities are associated
Here a composite entity is created to capture the ternary relationship
Does PK of PRESCRIPTION contain DOCTORID, PATIENTID, DRUG_ID (all are Foreign keys)? Yes
PK of EMPLOYEE is EMP_ID
Does PK of EMPLOYEE contain MANAGER_ID (FK)? No
Must CLASS exist with a PROFESSOR? Yes
Since this is a strong relationship, the PK of CLASS might be (PROFID, CLASSNUM), derived from PK of PROFESSOR.
Technically the PERSON table shall be on both ends of the relationship line as a parent on each side
To remove redundancy, one entity, PERSON, acts as both parents of MARRIAGE entity. And two relationship lines are drawn in parallel.
ID is put into the bridge table, MARRIAGE, twice. For unary relationship implementations, at least one copy of the attribute must be renamed to avoid duplicate names. In this case, one copy of ID was renamed to SPOUSE_ID.
A person can be linked to 0 or 1 ongoing marriage
A marriage can be linked to 1 and only 1 person, on each side of the relationship
Constraint: ID cannot be equal to SPOUSE_ID
RECORDING is the composite entity with strong relationships on both sides
RECORDING uses PERFORMERID (FK) and SONGID (FK) as a composite primary key
1 PERFORMER can be associated with 1 or many RECORDING
1 RECORDING can be associated with one and exactly one PERFORMER (assuming simplified scenario of no collaboration)
1 SONG can be associated with 1 or many RECORDING (recorded many times with different RELEASE_DATE)
1 RECORDING can be associated with 1 and only one song
How to model airline employees if all employee attributes are the same except:
Some employees have a pilot license
Some have a mechanic certificate
Some have CPA qualification as an accountant
Supertype: more generic entity type
Subtype: more specific entity type compared to its supertype
A subtype entity inherits all attributes of the supertype
A subtype has additional, specific attributes
An instance of a subtype is also an instance of the supertype
A pilot, mechanic, or accountant “is an” employee
An instance of the supertype may or may not be an instance of one or more subtypes
An employee may not be a pilot
Depicts arrangement of higher-level supertypes and lower-level subtypes
Relationships are described in terms of “is-a” relationships
Subtype exists within the context of a supertype; every subtype must have one supertype to be directly related to
Supertype can have many subtypes
Defining a general entity type from a set of specialised entity types
bottom-up process from subtypes to supertypes
Defining one or more subtypes of the supertype
top-down process from supertypes to subtypes
In the airline example, how do we find all attributes common to all employees (generalisation), and attributes that are different among employees (specialisation)?
All subtypes inherit primary key attribute from the supertype
Subtype inherits attributes from the supertype
Subtypes inherit all relationships in which supertype participates
At the implementation level, supertype and its subtypes maintain a 1:1 relationship
Attribute in the supertype determining which subtype the supertype instance is related to
E.g., EMP_TYPE (can be pilot, mechanic or accountant)
Default comparison condition is the equality comparison
Can an employee be a pilot and a mechanic at the same time?
Must an employee be one of the three (pilot, mechanic or accountant)?
Contain a unique subset of the supertype entity set
The value of the subtype discriminator attribute in the supertype cannot be more than 1 value at the same time
E.g., VOLUNTEER has MemType = “V”; STAFF has MemType = “S”;
Contain non-unique subsets of the supertype entity set
Implementation requires the use of one discriminator attribute for each subtype
E.g., discriminator StoryIsMovie is required for MOVIE subtype; discriminator StoryIsBook is required for BOOK subtype
StoryIsMovie and StoryIsBook can both be TRUE
Partial
Supertype has optional subtypes.
Subtype discriminator can be null (the instance is not of any designed subtype)
Subtype sets are unique.
Total
Every supertype instance must be a member of only one subtype.
Subtype discriminator canNOT be null.
Subtype sets are unique.
Partial
Supertype has optional subtypes.
Subtype discriminators can be null (the instance is not of any designed subtype)
Subtype sets are NOT unique.
Total
Every supertype instance is a member of at least one subtype.
Subtype discriminators canNOT be null.
Subtype sets are NOT unique.
‘d’ means disjoint, e.g., an employee cannot a pilot and a mechanic concurrently
One line indicates partial constraint, i.e., there are other employee types beyond P, M, A
Pilot, Mechanic, Accountant all have these attributes via inheritance, EMP_LNAME…
P, M, A subtypes can have dependents, just like supertype EMPLOYEE
Only Accountant has ACTCPADATE, while other subtypes do not
The value of EMP_TYPE can be “P”, “M”, “A”
Overlapping: a student can also be an employee
Total completeness: Everyone must be an employee, a student or both
*Partial completeness: ADMIN and PROF do not form the whole picture
Total completeness: Everyone must be either a Graduate or Undergraduate
Overlapping: an admin can also be a professor
*Disjoint: student cannot be graduate and undergrad at the same time
Must be unique: Ensures each record is uniquely identifiable. StudentID in STUDENT table.
Cannot be NULL: Every record must have a primary key value. EmployeeID must have a value for each entry.
Non-intelligent: Should not carry embedded meaning; avoid semantic data. Use zID instead of Name.
No change over time: Should remain constant; avoid attributes that are likely to change. Avoid EmployeeName, which could change due to various reasons.
Preferably single-attribute: Simplifies database design and improves performance. Use OrderID instead of a composite key (CustomerID, OrderDateTime)
Preferably numeric: Easier to input, avoids errors, and supports auto-incrementing. Use 12345 for CustomerID, instead of “ABCD”
Artificially created by org: Identifier should be generated internally rather than using external or personal data. UNSW's zID for students
Security-compliant: Avoids sensitive data that could breach privacy regulations. Use generated StudentID instead of Driver’s licence
Real-world, generally accepted identifier for real-world objects
Familiar to end users and in day-to-day business vocabulary
Most natural keys are acceptable as primary key
Natural keys would make good PK, but what if there is no natural key?
Using student’s passport or driver’s license as PK is a breach of data privacy
Primary key UNSW zID artificially created
Using composite key with FK CRSCODE together with a new attribute CLASSSECTION
Use composite PK (CLASSCODE, STUNUM), both being FK
Each PK combination is allowed only once in the M:N relationship ENROLMENT
Using Composite PK (CRSCODE, CLASSSECTION)
Cons: lengthy PK already; even longer PK if inherited by another weak entity, e.g., ENROLMENT entity would have (CRSCODE, CLASSSECTION, STU_NUM) as PK
Create a Surrogate PK CLASS_CODE
Pros: short PK
Caveat: we must create a unique constraint on CRSCODE and CLASSSECTION, with no duplicate combinations
Use Composite PK (CLASSCODE, STUNUM), both being FK
Each PK combination is allowed only once in the M:N relationship
Caveat: we must create a unique constraint on CLASSSCODE and STUNUM, with no duplicate combinations
Strong and weak entity
Composite entity
Supertype and Subtype
Convert Logical Model to Relational Model
Relational Modelling and SQL I