Week 1 - Principles, Origins and Objectives
Learning outcomes:
Understand the principles, origins and objectives of Database Management software
Understand the algebraic and logical foundations of the Relational Model
Introduction to databases (Bitesize)
Data: the raw, unprocessed facts (e.g. 29, John, Birkbeck)
Information: processed, organised data (e.g. John is 29 years old and studies at Birkbeck)
A database is an organised electronic collection of data. It can be stored locally or on the cloud. The database should be organised in a way that makes the data easy to access, retrieve and update. In essence, it is a digital filing system.
A database is persistent, meaning it still exists when a computer is turned off.
A field is a piece of information in a database, e.g. an address in an address book. These fields come together to create a record (e.g. the name, address and phone number of a business in an address book) which can then be stored in a table. Each record has a primary key – an identifier for each record (e.g. a hospital number).
Principles, origins and objectives
Database Management System – a software system that allows users to define (set constraints, specify data type, structure), construct (store data), manipulate (query and update the database) and share data. This way a large amount of data can be managed through one software.
Why do we need databases? It’s useful to have data that is persistent. Imagine if a computer crashes then all the data is lost! A regular file may be too small and inefficient for the vast amount of data a database may hold (some have terabytes of data). Databases also tend to be a shared resource – using one reduces the likelihood of data being overwritten by another user.
For example, imagine a new student about to enrol on a course. Without a database the data for the student might be held in record files maintained by different departments
Academic Dept Student Record Name: A Student Home Address: 26 Acacia Ave Next of Kin: A Partner | Registry Student Record Name: A Student Home Address: 26 Acacia Ave Enrolment Date: 29 Sept 2018 |
Fees Office Student Record Name: A Student Home Address: 28 Acacia Ave Fees Balance: 1200.00 | Exams Office Student Record Name: A Student Home Address: 28 Acasia Ave Exam Entry Date: 11 Dec 2018 |

Why this sucks – redundancy (waste of space), varies and not shared
A DBMS has 3 levels of abstraction (hiding irrelevant details from the user): physical/internal, logical/conceptual and view/external.
Physical – the lowest level. It tells us how data is actually stored in memory.
Logical – the type of data that is stored and the relationship between the data.
View – the level users see.
Why do we need a Database Management System (DBMS)?
Provides a query language – a specialised language for making queries and retrieving information from databases. One example is Structured Query Language (SQL). This abstracts the code behind data storage and retrieval, making it more user-friendly
Saves time with constraints – the rules or conditions applied to database data, to enforce data quality by limiting the kind of data that can go into a table. These can be automatically checked by a DBMS
Reduces maintenance time with data independence – The ability to make changes at one level without affecting a higher level. In other words, changes to the data structure/storage don’t affect applications that use the data. The two main types are logical (changing the logical structure e.g. adding an email column to an employee database) and physical (changing how data is physically stored e.g. moving from an HDD to an SSD).
Provides concurrent access – multiple users/applications can access the database simultaneously without causing inconsistencies. This reduces response time and increases efficiency
Recovery – if there is a failure (e.g. due to power loss) there is a lower risk of data loss. Transaction logs record changes made to the database and can be used to roll back.
Security – a DBMS provides many security features to prevent unauthorised access to a database
Can also reduce redundancy (the same data being stored in multiple places) but this isn’t always wanted!
Some disadvantages of using a DBMS are the cost and complexity. It would need to be purchased and implemented/managed by the appropriate staff.
History of database systems
Humans have been storing information for a very long time.
The first computerised databases came about in the 1960s. There were 2 popular data models: IMS and CODASYL.
IMS is a hierarchal model meaning data is in a tree-like structure. The parent (or root) node can have multiple child nodes connected to it. While the child has one parent. An example for this is a directory.
CODASYL is a network model. Here, child nodes can have multiple parent nodes though in this case they are called owners. This allows more flexibility between data points.

In the 70s the relational model was introduced. Here, data is organised into tables (relations). Each row represents an entity and each column is an attribute. This underpins many of today’s most popular DBMS.
There is now also Not only SQL (NoSQL). The schema (the blueprint for how the database is organised, includes logical constraints like data type and the relationship between entities) is flexible and the database scalable. However schemas change infrequently as doing so requires taking the database offline.
Entity Relationship Model
The ER model is a conceptual model for designing databases. It is useful to have before moving to a lower-level model (e.g. relational). It contains entities, attributes and relationships. The diagram of this model is called an ERD.
What’s required in an ERD?
Entities and entity types (sets): an entity is a “thing” that exists and can be uniquely identified (e.g. a person). An entity type (or set) is a related group of things. Entity types can be thought of as similar to a class in OOP – they have attributes and properties related to the database e.g. entity type ‘Student’ would have various students and their attributes such as name, course, ID number etc. In the ERD they are represented as a rectangle
Attributes – the properties of entity types. Usually composed of simple values e.g. an integer or string
Relationship type – an association between 2 or more entity types. They can be many-to-one/one-to-many (e.g. many employees in one department, represented by a line), one-to-one (e.g. one manager heads one department, represented by a line with crow’s feet) or many-to-many (e.g. a student has many lecturers and a lecturer has many students, represented by a line with crow’s feet on both sides)

Department & employee are the entity types. They have a one-to-many relationship.
Any Employee ‘objects’ have ID, last name and first name as attributes
Remember – the relational model organises the data into tables (relations). Table definitions are defined in the schema

Each entity becomes a table
Each attribute is a column
Each instance is a row.
The third table is a bridge table. They are optional in one-to-many (we could put the department name in the employee table) but required in many-to-many. It links the other two tables together with foreign key references (e.g. employee 4, Linda Hurst, belongs to sales).
ERDs continued
ERD entities often have a business context (e.g. people/roles, tangible and intangible objects). Relationship is how they relate to each other in a system. They are useful for
· Database design. Altering a DBMS directly can be risky so drawing an ER can help to identify and correct flaws before executing changes in the database.
· Debugging – having a full picture of the database shema with an ERD makes it easier to locate entities, view their attributes and identify relations, therefore making it easier to analyse a database to reveal problems
· Creation – some ERD tools can automatically generate a database turning the diagram into an actual reflection of the database structure
· Requirements – they are more user-friendly and an be used to determine the requirements of a database
An entity in ERD is a definable thing or concept in a system. Each entity becomes a relation (table) in a relational database.
An attribute is a property or characteristic of an entity. It is also known as a column. In ERDs they are given a name that describes the property and a type that describes the kind of attribute it is such as int or char. It is important the type is supported by the target DBMS
A primary key is a special attribute that uniquely identifies a record (or row) in the database table, therefore there cannot be duplicates.

A foreign key is a reference to the primary key in a different (foreign) table. For example, the department ID
Here we have a DEPARTMENT entity and TEACHER entity. The ID in the teacher table is the primary key. The department ID in the teacher table is the foreign key (hence it can be duplicated, or even null) – in the department table it is the primary key (where it is unique).
A relationship between two entities means that they are somehow related. Cardinality defines the number of relationships an entity has with another. The three common ones are one-to-many/many-to-one, one-to-one and many-to-many.
2: Introduction to the relational model
What we’re doing here is in the logical layer.
Reminder: the schema is the structure of the database, formed by multiple relations but not the data itself.
![]() |
Relation: an individual table in a relational database. It is composed of rows (tuples) and columns (attributes)
This relational database has 4 relations: SUPPLIER, PART, PROJECT and SUPPLY. Suppliers supply a certain quantity of parts to projects. The bridge table on the right shows us how these entities relate. For example supplier S1 (Smith with a status of 20 and in London) supplies part P1 (red nuts weighing 12, from London) to project J1 (Sorter in Paris). There is a many-to-many relationship (as one entity can be linked to many other entities and vice versa).

Each relation (table) is formed of tuples (rows) and named attributes (columns). The number of tuples in a relation is the cardinality and the number of attributes is the degree. The tuples contain a value for each attribute. The domain is a set of legal values that attributes draw their actual values from (e.g. like an enum or positive integers).
The set of attribute names for a relation is called the relation scheme.
The scheme is sometimes written as
Relation (Attribute1: Degree 1, A2: D2, … An: Dn)
e.g. SUPPLIER (S#: S#, SNAME: NAME, STATUS: STATUS, CITY: CITY)
Bear in mind that a schema is fixed, while the data itself is time-varying.
This can be simplified to
SUPPLIER(S#, SNAME, STATUS, CITY)
Therefore a user sees a relational database as a collection of time-varying relations of assorted degrees.
Relational Model formalisation
We can look at these in terms of set theory.
| Set | Tuple |
Elements | Unique | Can have duplicates |
Ordered | No (so {1, 3, 2} = {2, 3, 1}) | Yes |
Symbol | { } | ( ) |
A relation is a subset of the Cartesian product of a list of domains. Let’s break that down:
The Cartesian product (A x B) is a set of all possible ordered pairs {a, b}. for example if A = {1, 2, 3} and B = {5} then A x B = {(1, 5), (2, 5), (3, 5)}
Now imagine 4 students and their university course:
Name | Course |
Anna | 1 |
Bob | 1, 2 |
Carol | 3, 4 |
Dave | None |
The Cartesian product of the domains would show us every possible combination i.e. {(Anna, 1), (Anna, 2), (Anna, 3) …}
But we don’t want every possible pair! Only a subset of them. So the subsets for our relation would be
Relation R = {(Anna, 1), (Bob, 1), (Bob, 2), (Carol, 3), (Carol, 4)}
Which shows us that a relation is a set of tuples that are a subset of the Cartesian product of a list of domains.
In terms of the relational model the Cartesian product of all domains (D1, D2, …, Dn, n being the number of attributes) is the set of all n-tuples (v1, v2, …, vn) such that v1 is in D1, v2 is in D2 etc
For example, let’s consider the CONSISTS_OF relation

It has 2 attributes: DEP_NAME drawn from a domain of department names (D1) and EMP_ID drawn from a domain of employee IDs (D2).
D1 = {SALES, TECHNICAL, ACCOUNTS, ADMIN}
D2 = {1, 2, 3, 4, 5}
Then D1 x D2 = { (SALES, 1), (SALES, 2), (SALES, 3), (SALES, 4), (SALES, 5), (TECHNICAL, 1), (TECHNICAL, 2), (TECHNICAL, 3), (TECHNICAL, 4), (TECHNICAL, 5), (ACCOUNTS, 1), (ACCOUNTS, 2), (ACCOUNTS, 3), (ACCOUNTS, 4), (ACCOUNTS, 5), (ADMIN, 1), (ADMIN, 2), (ADMIN, 3), (ADMIN, 4), (ADMIN, 5)}
Because there are 5 tuples in D1 and 5 tuples in D2 we get 25 sets in the Cartesian product.
Since a relation is a set of tuples, within any given relation there are no duplicate tuples and the tuples are unordered. The attributes are also ordered (e.g. (SALES, 1) is not the same tuple as (1, SALES))
Relational Integrity Constraints
Since a database models a part of the real world, there needs to be constraints so a database can’t contain values that would be impossible in real-world situations.
A DBMS would enforce database integrity constraints to ensure impossible values do not occur. For the Suppliers-Parts-Projects database for example constraints could include part weights must be above 0 or every supplier supplies at least one part. In terms of data the constraints also help us to (1) prevent a deleted supplier from appearing in the bridge table and (2) prevent adding a non-existent supplier.
Those constraints are specific to the S-P-P database, but some authors argue that the relational model has a few database-independent constraints. Let’s look at 2 rules in particular: entity integrity rule and the referential integrity rule. Both of these involve keys and null values.
Keys
A key in DBMS is used to uniquely identify a tuple. A (candidate) key of a relation is one or more attributes of relation R that satisfy the following two time-independent properties: (1) no two distinct tuples of R have the same value for each of the key attributes and the first property does not hold for any proper subset (a subset that is not equal to the original set, they are smaller) of the key attributes. Property 1 means key attributes are unique, while 2 ensures a key is the minimal group of attributes that act as an identifier.
For example, consider this table
STUDENT_ID | EMAI | NAME |
01 | Alice | |
02 | Bob | |
03 | Claire |
{STUDENT_ID} and {EMAIL} are unique. {STUDENT_ID x EMAIL} is also unique however it is not a candidate key. This is because proper subsets of {STUDENT_ID x EMAIL} include {STUDENT_ID} and {EMAIL} – which have already been identified as unique. Meaning {STUDENT_ID x EMAIL} is not the minimum number of attributes.
Consider the SUPPLIER table. We’d need a way to uniquely identify tuples. S# does this so it could be a key. SNAME could do this but there is the chance of a new supplier having the same name. This does not make it time-independent so it wouldn’t be an ideal key. If a table doesn’t have a column to identify tuples then it’s good to make one! In S-P-P the S#, P# and J# do this.
Types of keys
Composite (or concatenated): a key that consists of more than one attribute. It acts as a primary key if there is no primary key in the table. A relation may have many composite keys.
Candidate: the minimum set of attributes that can uniquely identify a tuple. There can be more than one in a relation.
Nonkey: an attribute that is not part of any candidate keys. For example the COLOUR attribute in the PART table
Primary: a candidate key that is chosen by the database designer to identify tuples. It cannot contain any null values or duplicates. The remaining keys are also known as the alternate keys.
Foreign: a key that references a candidate key in another table or attribute. It is key (ha-ha) to the relationship between tables as it acts as a pointer. It can be composite. The foreign key doesn’t have to be a candidate key itself.
EMP_ID | LAST_NAME | FIRST_NAME | MANAGER_ID |
1 | JONES | BILL | 3 |
2 | CONNORS | MARY |
|
3 | HURST | LINDA | 2 |
EMP_ID is the primary key and MANAGER_ID is a foreign key (it references the primary key)
Note that the second tuple has a null MANAGER_ID.
NULL values
These are missing pieces of information and can occur for many reasons. For example in the S-P-P database we might know of a new supplier based in Wigan, but we don’t yet know the name or status. Here they would be represented as null.
We can also use them to represent the fact that a value is not relevant in a particular context. For example in an EMPLOYEE database that lists managers under the attribute MANAGER_ID the tuple representing the highest-level manager could be null in that column, as they wouldn’t have a manager.
Remember that null isn’t the same as zero. If someone’s age was missing from a database we wouldn’t want to list it as zero! Null values can be a pain to deal with; hence a lot of forms have fields that must be completed.
Classes of Integrity Constraint rules
Entity integrity: primary key values can uniquely identify tuples in a relation
Referential integrity: each non-null foreign key value must have matching candidate key values in some relation
