IBDP Computer Science Paper 2 - Comprehensive Database Learning Guide Notes

Introduction to Databases

A database is an organized collection of data that allows users to obtain and process information according to their requirements.
Databases provide a structured approach to storing, retrieving, and managing data, which is more efficient than traditional file systems.

Structured vs. Unstructured Data

Unstructured data: Information that is not organized in a predefined manner (e.g., data in word processors or text editors).
- Difficult to search, analyze, or manipulate efficiently.
- Example: Contact information in a text document without a specific format.
Structured data: Information stored in a predefined manner or structure, such as in databases.
- Organized into tables with consistent formats for easy searching, sorting, and analysis.
- Example: Contact information in a database table with fields for names, phone numbers, and addresses.

Advantages of Using Databases

Elimination of Data Duplication: Data is stored only once, saving storage space and preventing inconsistencies.
Data Consistency: Changes are made in one place and reflected for all users, ensuring everyone sees up-to-date information.
Efficient Data Retrieval: Designed for quick and efficient data retrieval, allowing users to find specific information easily.
Data Integrity: Enforces rules and constraints to maintain the accuracy and reliability of data, preventing invalid or inconsistent information from being stored.
Multi-user Access: Supports concurrent access by multiple users without conflicts.

Software for Databases

Database Management Systems (DBMS): Software for creating and managing databases.
- Examples:
  - Microsoft Access
  - MySQL
  - Oracle
  - IBM Db2
  - PostgreSQL
Programming Languages: Used to customize and interact with databases.
- Examples:
  - Python
  - JavaScript
  - Microsoft Visual Basic
  - Delphi
Query Languages: Used for retrieving and manipulating data in databases.
- SQL (Structured Query Language)

Structure of a Database

Tables: Data is stored in tables, each representing a specific type of entity or object.
Flat-file Database: A database with only one table.
Records: Rows in a table, each representing a single instance of the entity.
Fields: Columns in a table, representing different pieces of information about each entity.

Primary Keys

A primary key is a field (or combination of fields) that uniquely identifies each record in a table.
It can be automatically generated by the database system.
Essential for maintaining data integrity and establishing relationships between tables.

Validation Checks in Databases

Validation techniques ensure that the data entered into a database meets certain criteria and is accurate.
Common validation checks include:
- Length check: Ensures the data has the correct number of characters.
- Format check: Verifies that the data follows a specific pattern.
- Presence check: Confirms that required fields are not left blank.

Creating Simple Databases

Microsoft Access is a widely used tool for creating databases.
Steps:
1. Open Microsoft Access and create a blank database.
2. Select the table design view to design the table.
3. Define the primary key field and set its data type.
4. Add other fields with appropriate data types (Text, Number, Date/Time, Yes/No, Memo).
5. Set validation rules for fields as needed.
6. Save the table design and begin entering data.

Running Queries

Queries are used to retrieve specific information from a database based on certain criteria.
In Microsoft Access, queries can be created using the "Query Design" feature.
Steps:
1. Select the relevant fields.
2. Set a criterion for the desired information.
3. Run the query to display the results.

Relational Databases

What is a Relational Database?

A relational database is a type of database that is structured and allows users to identify and access data which is in relation to other data.
Relational databases distribute large sets of data over various tables and connect them using relationships.
This approach eliminates redundant data.

Relational Databases vs. Flat File Databases

Relational Database:
- Data is distributed across multiple tables connected through relationships.
- No redundant data as information is stored only once.
- More efficient use of memory space.
- Better data integrity and consistency.
- Supports complex queries across related data.
Flat File Database:
- Contains redundant data.
- Requires more memory space.
- Less efficient for complex data structures.
- Difficult to maintain as the database grows.
- Limited ability to represent relationships between different types of data.

Characteristics of Relational Databases

Entities: An object in a system for which information is stored. Each table typically represents one type of entity.
Attributes: Characteristics or properties of an entity. These become the fields or columns in a database table.
Records: Each row in a database table containing attributes about a specific entity.
Tables: Each table contains information about just one entity type.
Keys: Tables are linked using primary keys and foreign keys.

Types of Keys in Relational Databases

Primary Key:
- A unique identifier for each record in a table.
- Must have unique values with no duplicates.
- Cannot be NULL.
Secondary Key or Candidate Key:
- All keys that are not selected as the primary key are considered alternate or candidate keys.
- Can uniquely identify records in a table, just like the primary key.
- May consist of one or more columns (fields).
Foreign Key:
- An attribute in one table that refers to the primary key in another table.
- Helps connect two or more tables, establishing relationships between them.
- Essential for maintaining data integrity and preventing data redundancy

Example of Relational Database Structure

Consider a library database with three related tables:
- Book List Table: (Book ID, Title, Author, Publisher, Year, Genre)
- Member List Table: (Member ID, Name, Address, Phone, Email, Membership Type)
- Books Lent Table: (Transaction ID, Book ID, Member ID, Date Borrowed, Due Date, Return Status)

Normalization

Normalization is the process of dividing larger tables into smaller tables such that these smaller tables are related to each other.
The main purpose of normalization is to reduce data redundancy and improve data integrity.

Relationships Between Entities

One-to-One Relationship:
- One entity is related to exactly one other entity.
One-to-Many Relationship:
- One entity is related to multiple other entities.
Many-to-Many Relationship:
- Many entities are related to many other entities.
- Implemented using a junction table.

Working with Databases

Forms: User-friendly interfaces for inputting data.
Queries: Requests to access specific data from the database.
Reports: Formatted presentation of query results.
Modules: Programs or code components for custom operations.

Normalization

Introduction to Normalization

Database normalization is the process of organizing the attributes of a database to reduce or eliminate data redundancy.
This systematic approach to database design helps create efficient, reliable, and flexible database structures.

Issues Caused by Redundant Data

Data Inconsistency: Updating one entry but forgetting others leads to mismatched information.
Increased Storage Requirements: Repeated data takes up more disk space.
Complicated Data Maintenance: More effort is needed to update, insert, or delete records.
Poor Database Performance: Large redundant datasets slow down search and query performance.

Referential Integrity in Databases

Referential integrity refers to the accuracy and consistency of data within relationships.
It ensures that whenever a foreign key value is used, it must reference a valid, existing primary key in the parent table.
The importance of referential integrity includes:
- Preventing Orphan Records
- Maintaining Data Accuracy and Consistency
- Enforcing Logical Relationships
- Supporting Reliable Database Operations

Normal Forms

Normalization is typically carried out through a series of stages called normal forms.

First Normal Form (1NF)

Each column must contain atomic (indivisible) values
Each column must contain values of the same type
Each column must have a unique name
The order of rows and columns doesn't matter
No repeating groups or arrays of values are allowed

Second Normal Form (2NF)

Already be in 1NF
Have no partial dependencies—that is, no attributes (columns) should depend on only part of a composite primary key

Third Normal Form (3NF)

Already be in 2NF
Have no transitive dependencies—that is, non-key attributes should not depend on other non-key attributes

Problems with Unnormalized Databases

Insertion Anomalies: Difficulties arise when adding new data due to missing other data.
Update Anomalies: Changing data in one place requires changes in multiple places.
Deletion Anomalies: Removing data inadvertently deletes other valuable data.

Benefits of Normalization

Reduced Data Redundancy: Eliminates duplicate data.
Improved Data Integrity: Ensures that data remains accurate and consistent.
Better Database Organization: Creates a logical structure that reflects real-world entities and relationships.
Enhanced Query Performance: Overall performance often improves due to smaller table sizes and better indexing opportunities.
Greater Flexibility for Future Changes: Makes it easier to modify the database structure as requirements evolve.

Database Transactions and ACID Properties

Difference: Data vs Information

Data: Raw facts without context.
Information: Processed, meaningful data that provides context and value.

Database vs Information System

Database: Stores structured data.
Information System: Uses the database along with software and users to produce useful insights.

Why We Need a Database

Efficient data storage and organization
Fast access and retrieval
Support for multi-user environments
Maintenance of data integrity and security
Enablement of reporting, analysis, and decision-making

What is a Database Transaction?

A database transaction is a sequence of one or more operations that are treated as a single logical unit of work.
Must be atomic—either all operations are completed successfully, or none are.

Data Consistency

Data consistency means that all data across the database remains logical, correct, and synchronized across related tables.

Data Integrity

Data integrity refers to the accuracy, completeness, and reliability of data throughout its lifecycle.
Types of data integrity:
- Entity Integrity
- Referential Integrity
- Domain Integrity

What is Concurrency in Databases?

Concurrency refers to the ability of a database system to allow multiple users or processes to access or modify the data at the same time without interfering with each other.

Common Concurrency Problems and Solutions

Lost Updates
Dirty Reads
Non-repeatable Reads
Phantom Reads

What is ACID Property of a Database?

ACID is a set of four key properties that ensure reliable and consistent transactions in a database.

ACID Properties of Transactions

Atomicity – "All or Nothing"
Consistency – "Valid State Before and After"
Isolation – "Transactions Don't Interfere"
Durability – "Once Done, Stays Done"

Essential Database Functions

Query Function (Read/Retrieve Data)
Update Function (Modify Data)

Data Validation vs Verification

Data Validation: Ensures data format is correct and meets predefined criteria.
Data Verification: Ensures accuracy during data entry by comparing entered data against the original source.

Common Methods of Data Validation

Presence Check
Range Check
Format Check
Type Check
Length Check
Lookup Check
Uniqueness Check
Consistency Check

What is Data Verification?

Data verification is the process of ensuring that data entered into a system matches the original source accurately.

Methods of Data Verification:

Double Entry Verification
Visual Verification
Checksum Verification
Batch Totals

Database Schema and Data Types

Introduction to Data Types in Relational Databases

Data types define the kind of data that can be stored in a particular column of a table.
Choosing the correct data type is crucial for ensuring data integrity, optimizing storage, and enabling efficient data manipulation.

Different Data Types in Relational Databases

Numeric Data Types
- INTEGER
- REAL or FLOAT
- DECIMAL
String or Alphanumeric/Text Data Types
- VARCHAR(n)
- CHAR(n)
- TEXT
Date and Time Data Types
- DATE
- TIME
- DATETIME or TIMESTAMP
Boolean Data Type
- BOOLEAN

The Importance of Data Type Consistency

Consistent use of data types ensures that:
- Data is stored efficiently
- Operations perform correctly
- Application logic remains reliable
- Data validation is effective

The Potential Effects of Choosing the Wrong Data Type

Data Loss
Data Corruption
Storage Inefficiency
Performance Issues
Application Errors
Integrity Violations

Functions and Tools of a DBMS

Data Definition
Data Manipulation
Data Control
Data Integrity
Transaction Management
Recovery Management

Database Schema

A database schema is a blueprint or structure that defines how data is organized within a database.

Conceptual Schema

The conceptual schema is the high-level, abstract view of the entire database.

Logical Schema

The logical schema provides a more detailed view than the conceptual schema.

Physical Schema

The physical schema is the lowest level of abstraction, detailing how the data is physically stored on storage devices.

Data Dictionary

A Data Dictionary is a structured repository that defines and describes all the data elements in a database.

Data Definition Language (DDL)

Data Definition Language (DDL) is a subset of SQL used to define and modify the structure of database objects.

Importance of Data Modeling in the Design of a Database

Data modeling is the process of creating a visual representation of a database and the relationships between its elements.
Key Benefits of Data Modeling:
- Provides a Clear Structure
- Reduces Data Redundancy
- Improves Data Quality
- Enhances Performance
- Facilitates Communication
- Supports Scalability
- Enables Better Decision-Making

Database Security

Introduction to Database Security

Database security is the process of protecting a database from unauthorized access, misuse, or malicious attacks.

Why Database Security is Important

Data Confidentiality
Data Integrity
Data Availability

Different Methods to Ensure Database Security

Access Control
- Authentication
- Role-Based Access Control (RBAC)
Encryption
- Data at Rest Encryption
- Data in Transit Encryption
Database Auditing and Monitoring
- Activity Monitoring
- Audit Trails

Database Locking

Database locking is a mechanism used to control concurrent access to data in a database to ensure data integrity and consistency.

Database Locking Techniques

Row-Level Locking
Table-Level Locking
Page-Level Locking

Other Locking Techniques

Pessimistic Locking
Optimistic Locking

Real-World Database Security Scenarios

Banking System Transaction Security
User Authentication and Access Rights

Security Best Practices

Regular Security Audits
Keep Software Updated
Implement Strong Authentication
Encrypt Sensitive Data
Principle of Least Privilege
Regular Backups
Monitor Database Activity
Security Training