IBDP Computer Science Paper 2 - Comprehensive Database Learning Guide Notes

Introduction to Databases

  • A database is an organized collection of data that allows users to obtain and process information according to their requirements.
  • Databases provide a structured approach to storing, retrieving, and managing data, which is more efficient than traditional file systems.

Structured vs. Unstructured Data

  • Unstructured data: Information that is not organized in a predefined manner (e.g., data in word processors or text editors).
    • Difficult to search, analyze, or manipulate efficiently.
    • Example: Contact information in a text document without a specific format.
  • Structured data: Information stored in a predefined manner or structure, such as in databases.
    • Organized into tables with consistent formats for easy searching, sorting, and analysis.
    • Example: Contact information in a database table with fields for names, phone numbers, and addresses.

Advantages of Using Databases

  • Elimination of Data Duplication: Data is stored only once, saving storage space and preventing inconsistencies.
  • Data Consistency: Changes are made in one place and reflected for all users, ensuring everyone sees up-to-date information.
  • Efficient Data Retrieval: Designed for quick and efficient data retrieval, allowing users to find specific information easily.
  • Data Integrity: Enforces rules and constraints to maintain the accuracy and reliability of data, preventing invalid or inconsistent information from being stored.
  • Multi-user Access: Supports concurrent access by multiple users without conflicts.

Software for Databases

  • Database Management Systems (DBMS): Software for creating and managing databases.
    • Examples:
      • Microsoft Access
      • MySQL
      • Oracle
      • IBM Db2
      • PostgreSQL
  • Programming Languages: Used to customize and interact with databases.
    • Examples:
      • Python
      • JavaScript
      • Microsoft Visual Basic
      • Delphi
  • Query Languages: Used for retrieving and manipulating data in databases.
    • SQL (Structured Query Language)

Structure of a Database

  • Tables: Data is stored in tables, each representing a specific type of entity or object.
  • Flat-file Database: A database with only one table.
  • Records: Rows in a table, each representing a single instance of the entity.
  • Fields: Columns in a table, representing different pieces of information about each entity.

Primary Keys

  • A primary key is a field (or combination of fields) that uniquely identifies each record in a table.
  • It can be automatically generated by the database system.
  • Essential for maintaining data integrity and establishing relationships between tables.

Validation Checks in Databases

  • Validation techniques ensure that the data entered into a database meets certain criteria and is accurate.
  • Common validation checks include:
    • Length check: Ensures the data has the correct number of characters.
    • Format check: Verifies that the data follows a specific pattern.
    • Presence check: Confirms that required fields are not left blank.

Creating Simple Databases

  • Microsoft Access is a widely used tool for creating databases.
  • Steps:
    1. Open Microsoft Access and create a blank database.
    2. Select the table design view to design the table.
    3. Define the primary key field and set its data type.
    4. Add other fields with appropriate data types (Text, Number, Date/Time, Yes/No, Memo).
    5. Set validation rules for fields as needed.
    6. Save the table design and begin entering data.

Running Queries

  • Queries are used to retrieve specific information from a database based on certain criteria.
  • In Microsoft Access, queries can be created using the "Query Design" feature.
  • Steps:
    1. Select the relevant fields.
    2. Set a criterion for the desired information.
    3. Run the query to display the results.

Relational Databases

What is a Relational Database?

  • A relational database is a type of database that is structured and allows users to identify and access data which is in relation to other data.
  • Relational databases distribute large sets of data over various tables and connect them using relationships.
  • This approach eliminates redundant data.

Relational Databases vs. Flat File Databases

  • Relational Database:
    • Data is distributed across multiple tables connected through relationships.
    • No redundant data as information is stored only once.
    • More efficient use of memory space.
    • Better data integrity and consistency.
    • Supports complex queries across related data.
  • Flat File Database:
    • Contains redundant data.
    • Requires more memory space.
    • Less efficient for complex data structures.
    • Difficult to maintain as the database grows.
    • Limited ability to represent relationships between different types of data.

Characteristics of Relational Databases

  • Entities: An object in a system for which information is stored. Each table typically represents one type of entity.
  • Attributes: Characteristics or properties of an entity. These become the fields or columns in a database table.
  • Records: Each row in a database table containing attributes about a specific entity.
  • Tables: Each table contains information about just one entity type.
  • Keys: Tables are linked using primary keys and foreign keys.

Types of Keys in Relational Databases

  • Primary Key:
    • A unique identifier for each record in a table.
    • Must have unique values with no duplicates.
    • Cannot be NULL.
  • Secondary Key or Candidate Key:
    • All keys that are not selected as the primary key are considered alternate or candidate keys.
    • Can uniquely identify records in a table, just like the primary key.
    • May consist of one or more columns (fields).
  • Foreign Key:
    • An attribute in one table that refers to the primary key in another table.
    • Helps connect two or more tables, establishing relationships between them.
    • Essential for maintaining data integrity and preventing data redundancy

Example of Relational Database Structure

  • Consider a library database with three related tables:
    • Book List Table: (Book ID, Title, Author, Publisher, Year, Genre)
    • Member List Table: (Member ID, Name, Address, Phone, Email, Membership Type)
    • Books Lent Table: (Transaction ID, Book ID, Member ID, Date Borrowed, Due Date, Return Status)

Normalization

  • Normalization is the process of dividing larger tables into smaller tables such that these smaller tables are related to each other.
  • The main purpose of normalization is to reduce data redundancy and improve data integrity.

Relationships Between Entities

  • One-to-One Relationship:
    • One entity is related to exactly one other entity.
  • One-to-Many Relationship:
    • One entity is related to multiple other entities.
  • Many-to-Many Relationship:
    • Many entities are related to many other entities.
    • Implemented using a junction table.

Working with Databases

  • Forms: User-friendly interfaces for inputting data.
  • Queries: Requests to access specific data from the database.
  • Reports: Formatted presentation of query results.
  • Modules: Programs or code components for custom operations.

Normalization

Introduction to Normalization

  • Database normalization is the process of organizing the attributes of a database to reduce or eliminate data redundancy.
  • This systematic approach to database design helps create efficient, reliable, and flexible database structures.

Issues Caused by Redundant Data

  • Data Inconsistency: Updating one entry but forgetting others leads to mismatched information.
  • Increased Storage Requirements: Repeated data takes up more disk space.
  • Complicated Data Maintenance: More effort is needed to update, insert, or delete records.
  • Poor Database Performance: Large redundant datasets slow down search and query performance.

Referential Integrity in Databases

  • Referential integrity refers to the accuracy and consistency of data within relationships.
  • It ensures that whenever a foreign key value is used, it must reference a valid, existing primary key in the parent table.
  • The importance of referential integrity includes:
    • Preventing Orphan Records
    • Maintaining Data Accuracy and Consistency
    • Enforcing Logical Relationships
    • Supporting Reliable Database Operations

Normal Forms

  • Normalization is typically carried out through a series of stages called normal forms.
First Normal Form (1NF)
  • Each column must contain atomic (indivisible) values
  • Each column must contain values of the same type
  • Each column must have a unique name
  • The order of rows and columns doesn't matter
  • No repeating groups or arrays of values are allowed
Second Normal Form (2NF)
  • Already be in 1NF
  • Have no partial dependencies—that is, no attributes (columns) should depend on only part of a composite primary key
Third Normal Form (3NF)
  • Already be in 2NF
  • Have no transitive dependencies—that is, non-key attributes should not depend on other non-key attributes

Problems with Unnormalized Databases

  • Insertion Anomalies: Difficulties arise when adding new data due to missing other data.
  • Update Anomalies: Changing data in one place requires changes in multiple places.
  • Deletion Anomalies: Removing data inadvertently deletes other valuable data.

Benefits of Normalization

  • Reduced Data Redundancy: Eliminates duplicate data.
  • Improved Data Integrity: Ensures that data remains accurate and consistent.
  • Better Database Organization: Creates a logical structure that reflects real-world entities and relationships.
  • Enhanced Query Performance: Overall performance often improves due to smaller table sizes and better indexing opportunities.
  • Greater Flexibility for Future Changes: Makes it easier to modify the database structure as requirements evolve.

Database Transactions and ACID Properties

Difference: Data vs Information

  • Data: Raw facts without context.
  • Information: Processed, meaningful data that provides context and value.

Database vs Information System

  • Database: Stores structured data.
  • Information System: Uses the database along with software and users to produce useful insights.

Why We Need a Database

  • Efficient data storage and organization
  • Fast access and retrieval
  • Support for multi-user environments
  • Maintenance of data integrity and security
  • Enablement of reporting, analysis, and decision-making

What is a Database Transaction?

  • A database transaction is a sequence of one or more operations that are treated as a single logical unit of work.
  • Must be atomic—either all operations are completed successfully, or none are.

Data Consistency

  • Data consistency means that all data across the database remains logical, correct, and synchronized across related tables.

Data Integrity

  • Data integrity refers to the accuracy, completeness, and reliability of data throughout its lifecycle.
  • Types of data integrity:
    • Entity Integrity
    • Referential Integrity
    • Domain Integrity

What is Concurrency in Databases?

  • Concurrency refers to the ability of a database system to allow multiple users or processes to access or modify the data at the same time without interfering with each other.

Common Concurrency Problems and Solutions

  • Lost Updates
  • Dirty Reads
  • Non-repeatable Reads
  • Phantom Reads

What is ACID Property of a Database?

  • ACID is a set of four key properties that ensure reliable and consistent transactions in a database.
ACID Properties of Transactions
  • Atomicity – "All or Nothing"
  • Consistency – "Valid State Before and After"
  • Isolation – "Transactions Don't Interfere"
  • Durability – "Once Done, Stays Done"

Essential Database Functions

  • Query Function (Read/Retrieve Data)
  • Update Function (Modify Data)

Data Validation vs Verification

  • Data Validation: Ensures data format is correct and meets predefined criteria.
  • Data Verification: Ensures accuracy during data entry by comparing entered data against the original source.
Common Methods of Data Validation
  • Presence Check
  • Range Check
  • Format Check
  • Type Check
  • Length Check
  • Lookup Check
  • Uniqueness Check
  • Consistency Check
What is Data Verification?
  • Data verification is the process of ensuring that data entered into a system matches the original source accurately.
Methods of Data Verification:
  • Double Entry Verification
  • Visual Verification
  • Checksum Verification
  • Batch Totals

Database Schema and Data Types

Introduction to Data Types in Relational Databases

  • Data types define the kind of data that can be stored in a particular column of a table.
  • Choosing the correct data type is crucial for ensuring data integrity, optimizing storage, and enabling efficient data manipulation.

Different Data Types in Relational Databases

  • Numeric Data Types
    • INTEGER
    • REAL or FLOAT
    • DECIMAL
  • String or Alphanumeric/Text Data Types
    • VARCHAR(n)
    • CHAR(n)
    • TEXT
  • Date and Time Data Types
    • DATE
    • TIME
    • DATETIME or TIMESTAMP
  • Boolean Data Type
    • BOOLEAN

The Importance of Data Type Consistency

  • Consistent use of data types ensures that:
    • Data is stored efficiently
    • Operations perform correctly
    • Application logic remains reliable
    • Data validation is effective

The Potential Effects of Choosing the Wrong Data Type

  • Data Loss
  • Data Corruption
  • Storage Inefficiency
  • Performance Issues
  • Application Errors
  • Integrity Violations

Functions and Tools of a DBMS

  • Data Definition
  • Data Manipulation
  • Data Control
  • Data Integrity
  • Transaction Management
  • Recovery Management

Database Schema

  • A database schema is a blueprint or structure that defines how data is organized within a database.
Conceptual Schema
  • The conceptual schema is the high-level, abstract view of the entire database.
Logical Schema
  • The logical schema provides a more detailed view than the conceptual schema.
Physical Schema
  • The physical schema is the lowest level of abstraction, detailing how the data is physically stored on storage devices.

Data Dictionary

  • A Data Dictionary is a structured repository that defines and describes all the data elements in a database.

Data Definition Language (DDL)

  • Data Definition Language (DDL) is a subset of SQL used to define and modify the structure of database objects.

Importance of Data Modeling in the Design of a Database

  • Data modeling is the process of creating a visual representation of a database and the relationships between its elements.
  • Key Benefits of Data Modeling:
    • Provides a Clear Structure
    • Reduces Data Redundancy
    • Improves Data Quality
    • Enhances Performance
    • Facilitates Communication
    • Supports Scalability
    • Enables Better Decision-Making

Database Security

Introduction to Database Security

  • Database security is the process of protecting a database from unauthorized access, misuse, or malicious attacks.

Why Database Security is Important

  • Data Confidentiality
  • Data Integrity
  • Data Availability

Different Methods to Ensure Database Security

  • Access Control
    • Authentication
    • Role-Based Access Control (RBAC)
  • Encryption
    • Data at Rest Encryption
    • Data in Transit Encryption
  • Database Auditing and Monitoring
    • Activity Monitoring
    • Audit Trails

Database Locking

  • Database locking is a mechanism used to control concurrent access to data in a database to ensure data integrity and consistency.
Database Locking Techniques
  • Row-Level Locking
  • Table-Level Locking
  • Page-Level Locking
Other Locking Techniques
  • Pessimistic Locking
  • Optimistic Locking

Real-World Database Security Scenarios

  • Banking System Transaction Security
  • User Authentication and Access Rights

Security Best Practices

  • Regular Security Audits
  • Keep Software Updated
  • Implement Strong Authentication
  • Encrypt Sensitive Data
  • Principle of Least Privilege
  • Regular Backups
  • Monitor Database Activity
  • Security Training