Data Modeling using ER Model
Database Management System - Data Modeling using ER Model (Unit 1.2)
Syllabus UNIT-I
- Introduction to DBMS and its characteristics.
- DBMS architecture, client/server architecture, data models, schema and instances, and data independence.
- Data Modeling using Entity Relationship Model: Basic introduction about entities, entity types, entity sets.
- ER diagram notations, attributes and keys.
- Types of attributes: composite, derived, and multivalued attributes.
- Types of keys: Super Key, candidate key, primary key.
- Relationships, relation types, weak entities, enhanced E-R, specialization, and generalization.
Text Books:
- TB1. R. Elmarsi and SB Navathe, \"Fundamentals of Database Systems\", Pearson, 5th Ed.
- TB2. Singh S.K., \"Database System Concepts, design and application\", Pearson Education
- TB3. Ramakrishnan and Gherke, \"Database Management Systems\", TMH.
- TB4. Bipin Desai, \"An Introduction to Database Systems\", Galgotia Publications, 1991.
Reference Books:
- RBI. Abraham Silberschatz, Henry Korth, S. Sudarshan, \"Database Systems Concepts\", 6* Edition, McGraw Hill, 2010.
- RB2. Jim Melton, Alan Simon, \"Understanding the new SQL: A complete Guide\", Morgan Kaufmann Publishers, 1993.
- RB3. A. K. Majumdar, P. Battacharya, \"Database Management Systems\", TMH, 2017.
Outline
- Data Modeling using Entity Relationship Modelling: Introduction
- Entity, types, set
- Attribute, types of attributes
- Relationship: types, degree, roles, constraints: cardinality ratio and participation
- Concept of keys
- Types of Key: Super key, Candidate key, Primary key, Unique key, Alternate key, Composite Key, Foreign Key
Data Modeling Using the Entity-Relationship Model - INTRODUCTION
- Conceptual Database Design
- Requirements Collection and Analysis: Produces a description of the users' requirements.
- Conceptual Design: Produces a conceptual schema for the database, including detailed descriptions of entity types, relationship types, and constraints. All these are expressed in terms provided by the data model being used.
- Implementation: Transforms the conceptual schema into a representational/implementational model supported by the DBMS.
- Physical Design: Decides upon the internal storage structures and access paths (indexes).
Entity-Relationship (ER) Model - INTRODUCTION
- The ER model is a high-level conceptual data model used at the conceptual design phase of database design.
- It is a graphical approach for database design.
- Introduced by Peter Chen in 1976.
- Represents real-world situations using concepts commonly used by people.
- Allows defining a representation of the real world at a logical level.
- ER model focuses on the logical structure of data by indicating the grouping of data into entities.
- Supports a top-down approach where details can be given in successive stages.
- ER Diagram is used to represent the conceptual design.
Features of ER Model
- Graphical Representation for Better Understanding: Easy to understand and is used by developers to communicate with the stakeholders.
- ER Diagram: Used as a visual tool for representing the model.
- Database Design: Helps database designers to build the database and is widely used in database design.
Advantages of ER Model
- Simple: Conceptually easy to build if the relationships between attributes and entities are known.
- Effective Communication Tool: Used widely by database designers for communicating their ideas.
- Easy Conversion to any Model: Maps well to the relational model and can be easily converted to the relational model by converting the ER model to a table. Can also be converted to other models like network model, hierarchical model, etc.
Disadvantages of ER Model
- No industry standard for notation: No industry standard for developing an ER model. Different developers might use different notations.
- Hidden information: Some information might be lost or hidden in the ER model because it is a high-level view.
Components of ER Model
The ER Model is based on 3 main concepts:
- Entity
- Attributes
- Relationship
Components of ER Model - Entity
- An entity in the ER model is a thing or an object that exists in the real world and is distinguished from other entities.
- An entity can represent a physical object (e.g., house, person, automobile, widget), job, academic course, business, or a less tangible concept (e.g., company, transaction).
- An entity is represented by a set of attributes.
- Examples of entities: Person (Employee, Student, Patient), Place (Store, Building), Object (Machine, product, Car), Event (Sale, Registration, Renewal), Concept (Account, Course)
Components of ER Model - Attribute
- An attribute is a property or characteristic of an entity type.
- Each attribute has a value drawn from some domain (set of meaningful values).
- In ER diagrams, attributes are placed in an ellipse with a line connecting it to its associated entity.
- Attributes may also be associated with relationships.
- An attribute is associated with exactly one entity or relationship.
- A particular entity will have a value for each of its attributes.
- The attribute values that describe each entity become a major part of the data stored in the database.
Notations used in ER Diagram
| Symbol | Meaning |
|---|---|
| Rectangle | Entity/Strong Entity |
| Double Rectangle | Weak Entity |
| Ellipse/Oval | Attribute |
| Underlined Oval | Key Attribute |
| Double Ellipse/Oval | Multi-valued Attributes |
| Dotted Ellipse/Oval | Derived Attributes |
| Diamond | Relationship Types |
| Double Diamond | Identifying Relationship Types |
| Line | Links attributes to entity types and entity types with relationship types |
ER Diagram Symbols Meaning
- Rectangle symbol represents entity/strong entity
- Double rectangle symbol represents Weak entity
- Ellipse/Oval symbol represents attributes
- Primary key attributes are oval symbol with underlined
- Double Ellipses/ovals represent multi-valued attributes
- Dotted Ellipse/oval represent derived attributes
- Diamonds symbol represents relationship types
- Double diamonds symbol represents identifying relationship types
- Line symbol links attributes to entity types and entity types with other relationship types
Example of ER Diagram
- Entities: Person, place, object, event, or concept about which data is to be maintained (e.g., Car, Student)
- Relation: Association between the instances of one or more entity types (e.g., Blue Car Belongs to Student Jack)
- Attribute: Property or characteristic of an entity (e.g., Color of car entity, Name of Student entity)
Example ERD
- An employee works in a department
- A student studies various subject
- An event in the sport day of a school will be cancelled if there are no participants
Example 1 of ER Model
Faculty entity information requirements:
Characteristics: FacultyId (unique), Name (firstname and lastname), designation, dateofjoining, Qualification (multiple qualification), dateofbirth, Age (calculated using dateofbirth), Totalnoofexperience (derived by dateofjoining)
Entity name: Faculty
Example of ER Model
- University database entities: Student, Course, and Faculty.
- Student: Rollno, name, D_O_B, marks, address, and phone. Rollno uniquely identifies each student.
- Course: Course_id, cname, duration.
- Faculty: id, fname, qualification, date_of_joining, address, phone.
- Entity name: Student, Course, Faculty
Entity Types, Entity Sets, Keys, and Value Sets
- Entity type: Defines a collection (or set) of entities that have the same attributes. Each entity type in the database is described by its name and attributes.
- Entity set: The collection of all entities of a particular entity type in the database at any point in time.
- An entity type is represented in ER diagrams as a rectangular box enclosing the entity type name.
- The collection of entities of a particular entity type are grouped into an entity set, which is also called the extension of the entity type.
Types of Entity
| Feature | Strong Entity | Weak Entity |
|---|---|---|
| Primary Key | Has a primary key. | Does not have sufficient attributes to form a primary key. It always depends on a strong entity. |
| Representation | Represented by a rectangle symbol. | Represented by a double rectangle symbol. |
| Underline Symbol | Contains a primary key represented by the underline symbol. | Contains a Partial Key which is represented by a dashed underline symbol. |
| Key Attribute | Primary Key is one of its attributes which helps to identify its member. | In a weak entity, a combination of partial key and primary key of the strong entity set helps to identify its member. |
| Relationship with other entities | In the ER diagram, the relationship between two strong entities is shown by using a diamond symbol. | The relationship between one strong and a weak entity is shown by using the double diamond symbol called identifying relationship. |
| Connecting line in Relationship diagram | The connecting line of the strong entity with the relationship is single. | The line connecting the weak entity for identifying the relationship is double. |
Value Sets (Domains) of Attributes
Each attribute of an entity type is associated with a value set or domain of values, which specifies the set of values that may be assigned to that attribute.
For example, in the EMPLOYEE entity, which has 3 attributes: Name, Age, and Salary. Then
- the domain for age attribute is {55, 40, 25} for 3 entity sets e1, e2, and e3
- the domain for salary attribute is {80k, 30k, 20k} for 3 entity sets e1, e2, and e3
- the domain for name attribute is {\"John Smith\", \" Fred Brown\", \"Judy Clark\"} for 3 entity sets e1, e2, and e3
Types of Attributes
- Key attribute
- Simple/Atomic vs Composite
- Single-valued versus Multivalued
- Stored versus Derived.
- Complex attributes
- Null values
Key Attributes of an Entity Type
- An entity type usually has an attribute whose values uniquely identifies each individual entity in the collection.
- Such an attribute is called a key attribute, and its values can be used to identify each entity uniquely.
- In a STUDENT entity, rollno is the key attribute because no 2 students have the same roll numbers.
- In ER diagrammatic notation, each key attribute has its name underlined inside the oval.
Simple/Atomic Vs Composite Attributes
- Simple attribute: Cannot be further subdivided. It contains an atomic value. Examples: Rollno, name, dob are the simple attributes in the Student entity. Represented by an ellipse.
- Composite attributes: Can be divided into smaller subparts, which represent more basic attributes with independent meanings. Example, the Address attribute of the student entity can be treated as a composite attribute as it can be sub-divided into Street, City, State, and Country.
Single-Valued versus Multivalued Attribute
- Single-valued: Attributes have a single value for a particular entity. For example, Roll_no, name is a single-valued attribute of STUDENT.
- Multivalued: An attribute that may have more than one value for a given instance, e.g., STUDENT may have more than one phone. Represented by an ellipse with double lines.
Stored versus Derived Attributes
- Derived attributes: whose values can be determined from the values of other related attributes. Derived attributes are shown with dotted ellipse/oval shape
- Stored attribute: attributes which stores the values.
Complex attributes
- A complex attribute is an attribute that is both composite and multi valued.
Null attribute
- In some cases, a particular entity might not have an applicable value for a particular attribute or that value may be unknown. In such cases, we use a special attribute value (non-value), called null.
Initial conceptual Design for the COMPANY Database
Based on the requirements, we can identify four initial entity types in the COMPANY database: DEPARTMENT, PROJECT, EMPLOYEE, DEPENDENT
Relationships, Relationship Types, Roles, and Structural Constraints
- Relationship: An interaction/association between two or more entities based on a key attribute.
- Relationship type R among n entity types E1, E2, …, En defines a set of associations or a relationship set-among entities from these entity types. Mathematically the relational set R is a set of relationship instances ri
- Relationship instance ri: Each ri associates n individual entities (e1, e2, …, en), Each entity ej in ri is a member of entity set Ej, Relationships uniquely identified by keys of participating entities.
Relationship Degree, Role Names, and Recursive Relationships
- Types of Relationship:
- UNARY/Recursive Relationship: when a single entity is associated with itself in a relationship.
- BINARY Relationship: when two entities are associated in a relationship.
- TERNARY Relationship: when three entities are associated in a relationship.
- N-ary Relationship: when N entities are associated in a relationship where N>3.
- Degree of a relationship is the total number of participating entity types.
- UNARY relationship type is of degree 1.
- BINARY relationship type is of degree 2.
- TERNARY relationship type is of degree 3.
- N-ary relationship type is of degree N.
Relationship Degree, Role Names, and Recursive Relationships (contd.)
- Number of employees that work for a department.
- Entity types: EMPLOYEE and DEPARTMENT
- Employees e1, e3, and e6 work for department d1; employees e2 and e4 work for department d2 and e5 and e7 work for d3.
Types of Relationship
- Unary Relationship examples:
- SUPERVISION
- Binary Relationship examples:
- BORROWER
- Ternary relationship examples:
- SUPPLY
Role Names and Recursive Relationships
- Each entity type in a relationship plays a particular role.
- The role name specifies the role that a participating entity type plays in the relationship and explains what the relationship means.
- In a recursive relationship also called Unary relationship type, both participations are the same entity type but in different roles.
- For example, SUPERVISION relationships between EMPLOYEE (in the role of supervisor or boss) and (another) EMPLOYEE (in the role of subordinate or worker).
- In ER diagrams, it is needed to display the role names to distinguish participations.
Constraints on Relationship Types
- Relationship types have certain constraints that limit the possible combinations of entities that may participate in a relationship.
- If the company has a rule that each employee must work for exactly one department, then this constraint needs to be described in the schema.
- If we have the entities Doctor and Patient, the organization may have a rule that a patient cannot be seen by more than one doctor. This constraint needs to be described in the schema.
Constraints on Relationship Types
There are two main types of relationship constraints:
- Cardinality ratio constraints (generally for binary relationships)
- Participation constraints / Existence Dependencies
Cardinality Ratio for Binary Relationship
- Binary relationships are relationships between exactly two entities.
- The cardinality ratio specifies the maximum number of relationship instances that an entity can participate in.
- Possible cardinality ratios for binary relationship types: 1:1, 1:n, n:1, m:n.
- Cardinality ratios are shown on ER diagrams by displaying 1, m, and n on the diamonds.
- One to One (1:1): An entity in A is associated with exactly one entity in B, and one entity in B is associated with exactly one entity in A.
- One to Many (1:n): An entity in A is associated with any number of entities in B. An entity in B can be associated with at most one entity in A.
- Many to One (n:1): An entity in A is associated with at most one entity in B. An entity in B can be associated with any number of entities in A.
- Many to Many (m:n): An entity in A is associated with any number of entities in B. An entity in B can be associated with any number of entities in A.
Participation Constraints and Existence Dependencies
- Specifies whether the existence of an entity depends on its being related to another entity via the relationship type.
- Specifies the minimum number of relationship instances that each entity can participate in.
There are two types of participation constraints:
- Total Participation
- Partial Participation
- Total Participation specifies that each entity in the entity set must compulsorily participate in at least one relationship instance in that relationship set (mandatory participation). Represented using a double line.
- Partial Participation: If only a part of the set of entities participate in a relationship. Represented using a single line.
Example
COMPANY Database Data Requirements:
- The company is organized into departments. Each department has a unique name, a unique number, and a particular employee who manages the department. Track the start date when that employee began managing the department. A department may have several locations.
- A department controls a number of projects, each of which has a unique name, a unique number, and a single location.
- The database will store each employee’s name, Social Security number, address, salary, sex (gender), and birth date. An employee is assigned to one department but may work on several projects, which are not necessarily controlled by the same department. It is required to keep track of the current number of hours per week that an employee works on each project, as well as the direct supervisor of each employee (who is another employee).
- The database will keep track of the dependents of each employee for insurance purposes, including each dependent’s first name, sex, birth date, and relationship to the employee.
Steps for designing an ER Diagram
- Identify the entity/entities
- Identify the attributes that belong to each entity
- Identify key attributes of each entity and mark them with underline
- Identify the relationship between the entities
- Identify relationship cardinality ratio and participation between the entities
Concept of keys
- In RDBMS, databases are used to store massive amounts of data which is stored across multiple tables.
- A table contains a number of rows and columns.
- Columns represent attributes/fieldnames.
- Rows represent records/tuples. And each table might be running into thousands of rows. There may be many duplicate records with redundant information. So how do we deal with that?
- Here is a concept called KEYS. Key and its purpose
- In a table, a key is a column/attribute or set of attributes/columns that helps us in uniquely identifying the rows of a table.
- It is also used to establish and identify relationships between tables.
Types of keys
- Super key
- Candidate Key
- Primary key
- Unique key
- Alternative key
- Composite key
- Foreign key
Super key
- A super key is a set of one or more attributes whose values can be used to uniquely identify a record/tuple within a table.
- Also, it is a combination of all possible attributes that can be uniquely identifies each row in a table/relation
- A table may have more than one super key
Candidate Key
- A candidate key is a set of one or more attributes whose values can be used to uniquely identify a /row/record/tuple within a table/relation
- A candidate key is a minimal Super key with no redundant attribute.
- It is called the minimal super key because we can select a candidate key from the set of Super Keys.
- It should contain the minimum no of attributes that is more appropriate or more powerful.
- Candidate keys are a subset of Super keys.
Primary Key
- A primary key is an attribute or a set of attributes that can be used to uniquely identifies each row/record/tuple of a table/relation.
- The primary key is one of the candidate keys chosen by the database designer to uniquely identify each row.
- The value of the primary key is always unique (means not duplicate)
- The value of the Primary key can never be null, i.e., it cannot contain a null value.
- Primary key = Unique + Not Null
Unique Key
- A unique key is an attribute or a set of attributes of a relation schema used to uniquely identify each row in a relation.
- It may contain a null value.
Alternate Key
- The candidate key other than the primary key is called the Alternate Key.
Composite Key
- A composite key is a candidate key or primary key that consists of more than one attribute.
- Example: project (pno, pname, location) and primary key:{pno+pname} then it is called a composite key
Foreign Key
- A Foreign key is a key in a table that refers to some other table where it is the Primary key to establish the relation between two tables.
Key difference between SQL Keys
- SQL keys are used to uniquely identify rows in a table.
- SQL keys can either be a single column or a group of columns.
- The super key is a single key or a group of multiple keys that can uniquely identify tuples in a table.
- Super keys can contain redundant attributes that might not be important for identifying tuples.
- Candidate keys are a subset of Super keys. They contain only those attributes that are required to uniquely identify tuples.
- All Candidate keys are Super keys. But the vice-versa is not true.
- The primary key is a Candidate key chosen to uniquely identify tuples in the table.
- Primary key values should be unique and non-null.
- There can be multiple Super keys and Candidate keys in a table, but there can be only one Primary key in a table.
- Alternate keys are those Candidate keys that were not chosen to be the Primary key of the table.
- A composite key is a Candidate key that consists of more than one attribute.
- A foreign key is an attribute/field which is a Primary key in its parent table but is included as an attribute in the child table.
- Foreign keys may accept non-unique and null values.