M

Database Lecture Notes Review

Option A: Databases

  • 45 Hours

A.1 Basic Concepts

  • 5 hours

A.2 The Relational Database Model

  • 15 hours

A.3 Further Aspects of Database Management

  • 10 hours

A.1.1 Differences between Data and Information

  • Computers store data, which can be numeric, text, Boolean, etc., but has no intrinsic meaning to humans.

  • Data must be interpreted to produce information.

  • Data becomes information when it is put into a context that gives it meaning.

  • Example: The numbers 32, 23, 11, 08, 40, 17 are data. They become information when given context:

    • Home and away scores for 6 soccer teams

    • Temperatures in Celsius for 6 cities

    • Ages of the last 6 people through the Eiffel Tower turnstiles

  • Information = Data + Context

  • Databases technically store data, not information, but the terminology is used loosely with the assumption that stored data is interpreted within an appropriate context.

Data

Information

Raw facts

Data with context

No context

Processed data

Just numbers and/or text

Value-added to data: Summarized, organized, analyzed

A.1.2 Differences between Information Systems and Databases

  • Information System (IS): Combination of information technology and people's activities to support operations, management, and decision-making.

  • An IS is a group of components that interact to produce information.

  • Databases are a component within an information system.

  • Components of Information Systems:

    • Processes

    • People

    • Hardware

    • Software

    • Network Communications

    • Data (Stored in databases)

A.1.3 Need for Databases

  • Saving huge amounts of data.

  • Sharing data.

  • Problems with not using a database and instead using spreadsheets or text files:

    1. Size of Data: Spreadsheets are fine for small data, but millions of records slow them down significantly.

    2. Ease of Updating Data: Multiple people can't edit the same file simultaneously.

    3. Accuracy: Data entry errors are possible due to lack of validation.

    4. Security: Text files and spreadsheets are not secure.

    5. Redundancy: Duplication of data is possible, leading to accuracy issues.

    6. Incomplete Data: Not all data might be entered due to a lack of validation, affecting data integrity.

A.2 The Relational Database Model

  • A relational database stores and provides access to related data points.

  • Based on the relational model, representing data in tables (relations).

  • Each row is a record/tuple with a unique ID called the Primary key.

  • Columns/fields hold attributes, establishing relationships among data points.

A.2.1 Define DBMS and RDBMS

  • DataBase Management System (DBMS): System software for creating and managing databases, providing users and programmers a systematic way to create, retrieve, update, and manage data.

  • Relational Database Management System (RDBMS): A DBMS based on the relational model. Data is stored in tables, which are broken up into fields.

  • Fields are the columns in a table.

  • A record is a row of data in a table.

Feature

DBMS

RDBMS

Data Storage

Data as files.

Data in tabular form.

Data Organization

Hierarchical or navigational form.

Tables with primary keys.

Normalization

Not present.

Present.

Security

No specific security for data manipulation.

Integrity constraints for ACID properties.

Relationships

No inherent relation between tables.

Relationships are stored in the form of tables.

Data Access

Requires uniform methods to access stored information.

Supports tabular structure and relationships for data access.

Distributed Database Support

Not supported.

Supported.

Scalability

For small organizations and small data, supports single user.

Designed for large amounts of data, supports multiple users.

Examples

File systems, XML, etc.

MySQL, PostgreSQL, SQL Server, Oracle, etc.

A.2.6 Data Dictionary

  • Inventory of data elements in a database or data model, with detailed descriptions of its format, data ownership, relationships, meaning, source, and usage.

A.2.7 Data Definition Language (DDL)

  • A syntax similar to a programming language used to define databases and modify data.

  • DDL commands can create (CREATE), remove (DELETE or DROP), or modify (UPDATE) tables.

  • DDLs in database applications are a subset of SQL.

  • Importance:

    • Helps define the database structure or schema.

    • DDL commands are interpreted by the DBMS, so changes are saved permanently.

A.2.8 Data Modeling

  • A database model determines the logical structure of a database, including relationships and constraints.

  • The relational model (table-based) is the most popular.

  • Results in:

    • Reduced costs

    • Higher quality

    • Clearer scope

    • Fewer errors

    • Managed risk

A.2.10 Relationships within Databases (Entity-Relationship Diagrams)

  • One-to-one

  • One-to-many

  • Many-to-many

  • ER Diagram basics

    • ER Diagram stands for Entity Relationship Diagram, also known as ERD

    • displays the relationship of entity sets stored in a database

    • explains the logical structure of databases

    • created based on three basic concepts: entities, attributes and relationships.

  • Why use ER Diagrams?

    • Helps define terms related to entity relationship modeling.

    • Preview of how all your tables should connect, what fields are going to be on each table.

    • Helps to describe entities, attributes, relationships.

    • ER diagrams are translatable into relational tables which allows you to build databases quickly.

    • ER diagrams can be used by database designers as a blueprint for implementing data in specific software applications.

    • The database designer gains a better understanding of the information to be contained in the database with the help of ERP diagram.

    • ERD Diagram allows you to communicate with the logical structure of the database to users.

  • ER Diagrams Symbols & Notations Following are the main components and its symbols in ER Diagrams:

    • Rectangles: represents entity types

    • Ellipses : Symbol represent attributes

    • Diamonds: represents relationship types

    • Lines: It links attributes to entity types and entity types with other relationship types

    • Primary key: attributes are underlined

  • An entity set with only simple attributes will require only one table in a relational model:

    • The attributes of the entity will be the fields/columns of the table.

    • The key attribute of the entity will become the primary key of the table.

  • Components of the ER Diagram

    • Entity: Person, place, object, event or concept about which data is to be maintained

Relationship Cardinality

  • Defines the numerical attributes of the relationship between two entities or entity sets.

Types of Cardinal Relationships

1.  One-to-One Relationships
2.  One-to-Many Relationships
3.  Many to One Relationships
4.  Many-to-Many Relationships
1. One to One Relationship
  • When a single instance of an entity is associated with a single instance of another entity then it is called one to one relationship. For example, a person has only one passport and a passport given to one person.

2. One to Many Relationship
  • When a single instance of an entity is associated with more than one instances of another entity then it is called one to many relationship. For example - a customer can place many orders but an order cannot be placed by many customers.

3. Many to One Relationship
  • When more than one instances of an entity is associated with a single instance of another entity then it is called many to one relationship. For example - many students can study in a single college but a student cannot study in many colleges at the same time.

4. Many to Many Relationship
  • When more than one instances of an entity is associated with more than one instances of another entity then it is called many to many relationship. For example, a can be assigned to many projects and a project can be assigned to many students.

Cases of Binary Relationships and Tables Required:

  • Case 1: Binary relationship with cardinality ratio 1:1 Here, two tables will be required. Either combine ‘R’ with ‘A’ or ‘B’.

    • Way-01:

      1. AR ( a1 , a2 , b1 )

      2. B ( b1 , b2 )

    • Way-02:

      1. A ( a1 , a2 )

      2. BR ( a1 , b1 , b2 )

  • Case 2: Binary relationship with cardinality ratio 1:m Here, two tables will be required.

    1. A ( a1 , a2 )

    2. B ( a1 , b1 , b2 )

  • Case 3: Binary relationship with cardinality ratio m:1 Here, two tables will be required.

    1. A( a1 , a2 , b1 )

    2. B ( b1 , b2 )

  • Case 4: Binary relationship with cardinality ratio m:n Here, three tables will be required.

    1. A ( a1 , a2 )

    2. R ( a1 , b1 )

    3. B ( b1 , b2 )
      Note that a1,b1 in R represent a composite Primary key and that R is a joint table.

One-to-One Relationships
  • In a one-to-one relationship, one record in a table is associated with one and only one record in another table. For example, in a school database, each student has only one student ID, and each student ID is assigned to only one person.

One-to-Many Relationships
  • In a one-to-many relationship, one record in a table can be associated with one or more records in another table. For example, each customer can have many sales orders.

Many-to-Many Relationships
  • A many-to-many relationship occurs when multiple records in a table are associated with multiple records in another table. For example, a many-to-many relationship exists between customers and products: customers can purchase various products, and products can be purchased by many customers.

  • To avoid this problem, you can break the many-to-many relationship into two one-to-many relationships by using a third table, called a join table. Each record in a join table includes a match field that contains the value of the primary keys of the two tables it joins. (In the join table, these match fields are foreign keys.)

  • A typical example of a many-to many relationship is one between students and classes. A student can register for many classes, and a class can include many students.

  • Converting ER Diagrams to Tables in a RDBMS
    After designing an ER Diagram,
    ER diagram is converted into the tables in relational model.
    This is because relational models can be easily implemented by RDBMS like MySQL , Oracle etc.
    Following rules are used for converting an ER diagram into the relations/tables.

  • To set up a join table for a many-to-many relationship:

A.1.9 Data Validation and Verification

  • Data validation

    • Checking input data to ensure it conforms with the data requirements of the system to avoid data errors.

  • Data verification

    • Ensuring the user types in what he or she intends, verifying they don't make a mistake when inputting data (e.g., typing email twice).

  • Why use data validation and data verification?

    • Checks for correctness, meaningfulness, and security of data that are input to the system.

A.2.3 Promoting Data Security with DBMS

  1. Access Rights: Controlled by PINs and passwords, granting read-only, read, or write access.

  2. Audit trails: Records of who accessed data and what changes are made.

  3. Data Locking: Locking protocols for concurrency control.

  4. Data Validation

  5. Data encryption: encryption algorithm transforms data within a database from a readable state into a ciphertext of unreadable characters.

  6. Backups: Clean copy of all data.

A.1.5 Database Transaction

  • Logical unit of work independently executed for data retrieval or updates.

  • Must exhibit four properties (ACID) to qualify as a transaction.

A.1.6 Concurrency

  • Defined as operating or occurring at the same time; running parallel

  • Database management systems (DBMS) concept that is used to address conflicts with simultaneous (same time) accessing or altering of data that can occur with a multi-user system.

  • Concurrency control, when applied to a DBMS (Database management system), is meant to coordinate simultaneous transactions while preserving data integrity.

  • If two or more users try to update the contents of a database simultaneously, locks and partitions are put into place to prevent it. Thus enabling greater concurrency.

ACID Properties

  • Atomicity: Either all operations in the transaction happen, or none do.

  • Consistency: Database must be consistent before and after the transaction

  • Isolation: Multiple transactions occur independently without interference.

  • Durability: Changes of a successful transaction occurs even if the system failure occurs.

A.1.8 Database Functions

  1. Query Functions:

    • Requesting data or information from a database table.

    • Filtering information to display specified data.

  2. Update Functions:

    • Modifying existing records in a table.

A.2.2 Functions and Tools of a DBMS

  1. Data Dictionary (Tool) Management

  2. data storage management

  3. data transformation and presentation

  4. multi-user access control

  5. data integrity management

  6. database communication interfaces

  7. security management

  8. transaction management

  9. Data Definition Language (Tool)

  10. backup and recovery management

A.2.4 Schema Definition

  • Schema refers to the organization of data as a blueprint of how the database is constructed.

  • Conceptual Schema → External View/Level

  • Logical Schema → data model (ER)

  • Physical Schema → Internal Level (Data Dictionary)

A.2.5 Levels of Schema

  • Conceptual/External Schema

    • Describes structure of the whole database.

    • Describes what data are to be stored and the relationships among the data.

    • Programmers and database administrators work at this level.

  • Logical Schema:

    • Defines the structure of the data itself and the relationships between the various attributes, tables, and entries.

  • Internal/Physical Schema:

    • Describes the physical storage structure of the database.

    • Describes how the database is stored in computer memory.

    • Used to describe complex low-level data structures in detail.

A.2.9 Database Terms

  • Table

  • Record

  • Field

  • Primary key

  • Secondary key

  • Foreign key

  • Candidate key

  • Composite primary key

  • Join

A.2.11 Redundant Data Issues

  • Data Integrity

  • Data Reliability

  • Space

  • Time

A.2.13 Differences between 1NF, 2NF, and 3NF

  • Normalization:

    • Process of reorganizing data in a database so that it meets two basic requirements:

    • 1. There is no redundancy of data, all data is stored in only one place.

    • 2. Data dependencies are logical, all related data items are stored together.

    • Important for many reasons, but chiefly because:

      • 1. It allows databases to take up as little disk space as possible,

      • 2. resulting in increased performance.

Normal Form (NF) Types

  • First normal form (1NF)

  • Second normal form (2NF)

  • Third normal form (3NF)

First normal form (1NF)
  • Tables in 1NF must adhere to some rules:

    1. Each cell (data item) must contain only a single (atomic) value, that cannot be broken down any further.

    2. Every column (field) in the table must be uniquely named.

    3. Each row/record is unique and has a primary key, there are no records with repeating data.

Second Normal Form (2NF)
  • 2NF Rules:

    1. Tables in 2NF must be in 1NF

    2. Non-key attributes must depend on every part of the primary key (full functional dependency). Partial dependencies are NOT allowed.

Third Normal Form (3NF)
  • 3NF Rules:

    1. Tables in 3NF must be in 2NF

    2. There are no non-key attributes that depend on another non-key attribute (no transitive functional dependencies)
      *

Key points

*   1NF has no repeating rows or columns.
*   2NF is based on full functional dependency.
*   3NF involves the removal of transitive dependencies

A.2.12 Referential Integrity

  • What is referential integrity?

    • refers to the relationship between tables

  • Why is it important in a normalized database?

    • Because each table in a database must have a primary key, this primary key can appear in other tables because of its relationship to data within those tables.

    • When a primary key from one table appears in another table, it is called a foreign key.

    • Prevents the entry of duplicate data

    • Prevents one table from pointing to a nonexistent field in another table.

    • Guaranteed consistency between "partnered" tables

    • Prevents the deletion of a record that contains a value referred to by a foreign key in another table

    • Prevents the addition of a record to a table that contains a foreign key unless there is a primary key in the linked table

    Consequences of Lack of Referential Integrity

  • a database can lead to incomplete data being returned, usually with no indication of an error. This could result in records being “lost” in the database, because they’re never returned in queries or reports.

A.3 Further Aspects of Database Management

  • 10 hours

A.3.1 Role of a Database Administrator

  • Database administrators (DBAs) use specialized software to store and organize data. The role may include capacity planning, installation, configuration, database design, migration, performance monitoring, security, troubleshooting, as well as backup and data recovery.

A.3.2 End-User Interaction with a Database

  • Each end-user has a defined set of jobs/functions to perform in an organization. These jobs and functions usually access multi-tables in the database. The raw view of the data is seldom required by end-users. In fact the raw data needs to be protected at all times from accidental damage. However the end-user is most likely to be using a given set of pre-defined queries on a regular basis. These queries usually end up as reports for middle or higher management and are used in the various decision making processes of the company. Therefore the interaction of a given end-user with a database is pre-defined by their role in the company and based on their role they will have a given ‘view’ of the database. Examples of methods of interaction with the DB:

    • GUI

    • API

    • QBE: Query By Example

    • SQL

A.3.3 Methods of Database recovery

  • Disaster Recovery

  • Partial Recovery

  • Catastrophic Recovery

  • High Availability

  • Crash Recovery: DBMS is a highly complex system with hundreds of transactions being executed every second. The durability and robustness of a DBMS depends on its complex architecture and its underlying hardware and system software. If it fails or crashes amid transactions, it is expected that the system should follow some sort of algorithm or techniques to recover lost data.

Failure Classification
  • To see where the problem has occurred, we generalize a failure into various categories, as follows: System Crash, Disk failure and Transaction failure.

1. System Crash
  • external to the system − that may cause the system to stop abruptly and cause the system to crash. For example, interruptions in power supply may cause the failure of underlying hardware or software failure. Examples may include operating system errors.

2. Disk Failure
  • In early days of technology evolution, it was a common problem where hard-disk drives or storage drives used to fail frequently. Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failure, which destroys all or a part of disk storage.
    *

Storage Structure categories:

*   Volatile storage
*   Non-volatile storage
3. Transaction failure
  • A transaction has to abort when it fails to execute or when it reaches a point from where it can’t go any further. This is called transaction failure where only a few transactions or processes are hurt.
    *

Reasons for a transaction failure could be:

*   Logical error
*   System error

*

Recovery and Atomicity
  • When a system crashes, it may have several transactions being executed and various files opened for them to modify the data items.

Recovery Techniques concepts
  • 1 Checkpoint

  • 2 Page Shadowing

  • 3 Log Files

Recovery Techniques (Cost vs Efficiency)
  • CheckPoint (cost & efficiency)

  • Page Shadowing (cost & efficiency)

  • Log Files (cost & efficiency)

A.3.4 Integrated database systems function
  • Data integration involves combining data from various sources and providing users with a unified view of these data in such a way that it is productive to their role in the organization. Commercial companies and scientific organizations are two examples which require data integration.

Advantages of Integrated DBMS
  1. Centralized Control 2. Data Integration 3. Improved Efficiency 4. Enhanced Data Security

A.3.5 Database Applications

  • Outline the use of databases in areas such as stock control, police records, health records, employees data Databases can be used in these situations because they can help maintain the privacy and security of the information stored in them. And they can help maintain the integrity of the data so it is consistent and doesn't change.

    • Stock control

    • Police records

    • Health records

    • Employee data

A.3.6 Ensuring Privacy

  • Suggest methods to ensure the privacy of the personal data and the responsibility of those holding personal data not to sell or divulge it in any way.

Keeping data secure

  • Making regular backups of files

  • Protecting yourself against viruses by running anti-virus software

  • Using a system of passwords so that access to data is restricted

  • Safe storage of important files stored on removable disks, eg locked away in a fireproof and waterproof safe

  • Allowing only authorized staff into certain computer areas

  • Secure Data Transmission: Use secure communication protocols, such as HTTPS or VPNs, to encrypt data during transmission. This prevents interception or eavesdropping by unauthorized entities.

  • Avoiding accidental deletion of files by write-protecting disks.

  • Using data encryption techniques to code data so that it makes no apparent sense.

The implications of large database systems:
  1. Data Volume: Large database systems handle vast amounts of data, which can present challenges in terms of storage, processing, and retrieval. Managing and analyzing such large volumes of data requires efficient infrastructure and optimized database design.

  2. Performance Considerations: As databases grow in size, the performance of queries, data retrieval, and data manipulation operations can be impacted. Ensuring efficient indexing, query optimization, and appropriate hardware resources is crucial for maintaining optimal performance.

  3. Scalability: Large database systems need to be scalable to accommodate increasing data volumes and user demands. Scaling may involve adding more hardware resources, such as storage or processing power, or implementing distributed database architectures.

  4. Data Complexity: Large databases often contain diverse and complex data structures, including multiple tables, relationships, and data types. Handling and managing this complexity requires advanced database management systems and skilled database administrators.

  5. Data Security: The security of large databases becomes critical due to the significant amount of sensitive data they store. Implementing robust security measures, such as access controls, encryption, and intrusion detection systems, is essential to protect against unauthorized access and data breaches.

The Data Protection Act
  • The Data Protection Act controls how your personal information is used by organisations, businesses, or the government.

  • They must make sure the information is:

  1. used fairly and lawfully

  2. used for limited, specifically stated purposes

  3. used in a way that is adequate, relevant and not excessive

  4. accurate

  5. kept for no longer than is absolutely necessary

  6. handled according to people’s data protection rights

  7. kept safe and secure

  8. not transferred outside the European Economic Area without adequate protection

Social and ethical issues related to privacy, security, and data integrity
  1. Privacy Concerns

  2. Data Breaches and Security Risks

  3. Data Ownership and Control

  4. Data Integrity and Trust

  5. Informed Consent

A.3.7 Database Interrogation
  • It is important in this day and age, where sudden terrorist activities can completely curtail the lives and livelihoods of large numbers of people that organizations such as National Security Agencies, International Police forces and other governmental agencies directly involved with the security of the borders and nationals of any given country to be able to monitor ‘trends’ in activities of the nationals of that country. This implies that the databases for such organizations must be open for questioning and investigations for the purposes of national security.

A.3.8 Data Matching vs. Data Mining Data Matching
  • the process of comparing two pieces of data by directly matching them against each other, to find any data belonging to the same entity, meaning duplication. An algorithm is used to determine whether two records are a match, a non-match or possibly a match. Data-matching can be useful for a number of purposes:

    • To find particular types of errors

    • To detect illegal behavior

    • To locate individuals

    • To find out if a given individual is eligible for benefits

    • To facilitate debt collection

Data Mining
  • the process of searching through large data sets to find patterns and relationships in the data. It is mainly used in marketing and sales to analyze the patterns of customers and common behavior. For example banks are the first users of data mining technology as it helps them with credit assessment.

    • Some of the uses of Data-mining within different contexts are:

    • To enable organizations to create successful and effective sales campaigns

    • To help companies with targeted marketing plans

    • To develop products to increase sales and profitability

    • Security Agencies to analyze calling patterns and hence detect terrorist activities