IBDP Computer Science Paper 2 - Comprehensive Database Learning Guide Notes
Introduction to Databases
- A database is an organized collection of data that allows users to obtain and process information according to their requirements.
- Databases provide a structured approach to storing, retrieving, and managing data, which is more efficient than traditional file systems.
Structured vs. Unstructured Data
- Unstructured data: Information that is not organized in a predefined manner (e.g., data in word processors or text editors).
- Difficult to search, analyze, or manipulate efficiently.
- Example: Contact information in a text document without a specific format.
- Structured data: Information stored in a predefined manner or structure, such as in databases.
- Organized into tables with consistent formats for easy searching, sorting, and analysis.
- Example: Contact information in a database table with fields for names, phone numbers, and addresses.
Advantages of Using Databases
- Elimination of Data Duplication: Data is stored only once, saving storage space and preventing inconsistencies.
- Data Consistency: Changes are made in one place and reflected for all users, ensuring everyone sees up-to-date information.
- Efficient Data Retrieval: Designed for quick and efficient data retrieval, allowing users to find specific information easily.
- Data Integrity: Enforces rules and constraints to maintain the accuracy and reliability of data, preventing invalid or inconsistent information from being stored.
- Multi-user Access: Supports concurrent access by multiple users without conflicts.
Software for Databases
- Database Management Systems (DBMS): Software for creating and managing databases.
- Examples:
- Microsoft Access
- MySQL
- Oracle
- IBM Db2
- PostgreSQL
- Programming Languages: Used to customize and interact with databases.
- Examples:
- Python
- JavaScript
- Microsoft Visual Basic
- Delphi
- Query Languages: Used for retrieving and manipulating data in databases.
- SQL (Structured Query Language)
Structure of a Database
- Tables: Data is stored in tables, each representing a specific type of entity or object.
- Flat-file Database: A database with only one table.
- Records: Rows in a table, each representing a single instance of the entity.
- Fields: Columns in a table, representing different pieces of information about each entity.
Primary Keys
- A primary key is a field (or combination of fields) that uniquely identifies each record in a table.
- It can be automatically generated by the database system.
- Essential for maintaining data integrity and establishing relationships between tables.
Validation Checks in Databases
- Validation techniques ensure that the data entered into a database meets certain criteria and is accurate.
- Common validation checks include:
- Length check: Ensures the data has the correct number of characters.
- Format check: Verifies that the data follows a specific pattern.
- Presence check: Confirms that required fields are not left blank.
Creating Simple Databases
- Microsoft Access is a widely used tool for creating databases.
- Steps:
- Open Microsoft Access and create a blank database.
- Select the table design view to design the table.
- Define the primary key field and set its data type.
- Add other fields with appropriate data types (Text, Number, Date/Time, Yes/No, Memo).
- Set validation rules for fields as needed.
- Save the table design and begin entering data.
Running Queries
- Queries are used to retrieve specific information from a database based on certain criteria.
- In Microsoft Access, queries can be created using the "Query Design" feature.
- Steps:
- Select the relevant fields.
- Set a criterion for the desired information.
- Run the query to display the results.
Relational Databases
What is a Relational Database?
- A relational database is a type of database that is structured and allows users to identify and access data which is in relation to other data.
- Relational databases distribute large sets of data over various tables and connect them using relationships.
- This approach eliminates redundant data.
Relational Databases vs. Flat File Databases
- Relational Database:
- Data is distributed across multiple tables connected through relationships.
- No redundant data as information is stored only once.
- More efficient use of memory space.
- Better data integrity and consistency.
- Supports complex queries across related data.
- Flat File Database:
- Contains redundant data.
- Requires more memory space.
- Less efficient for complex data structures.
- Difficult to maintain as the database grows.
- Limited ability to represent relationships between different types of data.
Characteristics of Relational Databases
- Entities: An object in a system for which information is stored. Each table typically represents one type of entity.
- Attributes: Characteristics or properties of an entity. These become the fields or columns in a database table.
- Records: Each row in a database table containing attributes about a specific entity.
- Tables: Each table contains information about just one entity type.
- Keys: Tables are linked using primary keys and foreign keys.
Types of Keys in Relational Databases
- Primary Key:
- A unique identifier for each record in a table.
- Must have unique values with no duplicates.
- Cannot be NULL.
- Secondary Key or Candidate Key:
- All keys that are not selected as the primary key are considered alternate or candidate keys.
- Can uniquely identify records in a table, just like the primary key.
- May consist of one or more columns (fields).
- Foreign Key:
- An attribute in one table that refers to the primary key in another table.
- Helps connect two or more tables, establishing relationships between them.
- Essential for maintaining data integrity and preventing data redundancy
Example of Relational Database Structure
- Consider a library database with three related tables:
- Book List Table: (Book ID, Title, Author, Publisher, Year, Genre)
- Member List Table: (Member ID, Name, Address, Phone, Email, Membership Type)
- Books Lent Table: (Transaction ID, Book ID, Member ID, Date Borrowed, Due Date, Return Status)
Normalization
- Normalization is the process of dividing larger tables into smaller tables such that these smaller tables are related to each other.
- The main purpose of normalization is to reduce data redundancy and improve data integrity.
Relationships Between Entities
- One-to-One Relationship:
- One entity is related to exactly one other entity.
- One-to-Many Relationship:
- One entity is related to multiple other entities.
- Many-to-Many Relationship:
- Many entities are related to many other entities.
- Implemented using a junction table.
Working with Databases
- Forms: User-friendly interfaces for inputting data.
- Queries: Requests to access specific data from the database.
- Reports: Formatted presentation of query results.
- Modules: Programs or code components for custom operations.
Normalization
Introduction to Normalization
- Database normalization is the process of organizing the attributes of a database to reduce or eliminate data redundancy.
- This systematic approach to database design helps create efficient, reliable, and flexible database structures.
Issues Caused by Redundant Data
- Data Inconsistency: Updating one entry but forgetting others leads to mismatched information.
- Increased Storage Requirements: Repeated data takes up more disk space.
- Complicated Data Maintenance: More effort is needed to update, insert, or delete records.
- Poor Database Performance: Large redundant datasets slow down search and query performance.
Referential Integrity in Databases
- Referential integrity refers to the accuracy and consistency of data within relationships.
- It ensures that whenever a foreign key value is used, it must reference a valid, existing primary key in the parent table.
- The importance of referential integrity includes:
- Preventing Orphan Records
- Maintaining Data Accuracy and Consistency
- Enforcing Logical Relationships
- Supporting Reliable Database Operations
- Normalization is typically carried out through a series of stages called normal forms.
- Each column must contain atomic (indivisible) values
- Each column must contain values of the same type
- Each column must have a unique name
- The order of rows and columns doesn't matter
- No repeating groups or arrays of values are allowed
- Already be in 1NF
- Have no partial dependencies—that is, no attributes (columns) should depend on only part of a composite primary key
- Already be in 2NF
- Have no transitive dependencies—that is, non-key attributes should not depend on other non-key attributes
Problems with Unnormalized Databases
- Insertion Anomalies: Difficulties arise when adding new data due to missing other data.
- Update Anomalies: Changing data in one place requires changes in multiple places.
- Deletion Anomalies: Removing data inadvertently deletes other valuable data.
Benefits of Normalization
- Reduced Data Redundancy: Eliminates duplicate data.
- Improved Data Integrity: Ensures that data remains accurate and consistent.
- Better Database Organization: Creates a logical structure that reflects real-world entities and relationships.
- Enhanced Query Performance: Overall performance often improves due to smaller table sizes and better indexing opportunities.
- Greater Flexibility for Future Changes: Makes it easier to modify the database structure as requirements evolve.
Database Transactions and ACID Properties
- Data: Raw facts without context.
- Information: Processed, meaningful data that provides context and value.
- Database: Stores structured data.
- Information System: Uses the database along with software and users to produce useful insights.
Why We Need a Database
- Efficient data storage and organization
- Fast access and retrieval
- Support for multi-user environments
- Maintenance of data integrity and security
- Enablement of reporting, analysis, and decision-making
What is a Database Transaction?
- A database transaction is a sequence of one or more operations that are treated as a single logical unit of work.
- Must be atomic—either all operations are completed successfully, or none are.
Data Consistency
- Data consistency means that all data across the database remains logical, correct, and synchronized across related tables.
Data Integrity
- Data integrity refers to the accuracy, completeness, and reliability of data throughout its lifecycle.
- Types of data integrity:
- Entity Integrity
- Referential Integrity
- Domain Integrity
What is Concurrency in Databases?
- Concurrency refers to the ability of a database system to allow multiple users or processes to access or modify the data at the same time without interfering with each other.
Common Concurrency Problems and Solutions
- Lost Updates
- Dirty Reads
- Non-repeatable Reads
- Phantom Reads
What is ACID Property of a Database?
- ACID is a set of four key properties that ensure reliable and consistent transactions in a database.
ACID Properties of Transactions
- Atomicity – "All or Nothing"
- Consistency – "Valid State Before and After"
- Isolation – "Transactions Don't Interfere"
- Durability – "Once Done, Stays Done"
Essential Database Functions
- Query Function (Read/Retrieve Data)
- Update Function (Modify Data)
Data Validation vs Verification
- Data Validation: Ensures data format is correct and meets predefined criteria.
- Data Verification: Ensures accuracy during data entry by comparing entered data against the original source.
Common Methods of Data Validation
- Presence Check
- Range Check
- Format Check
- Type Check
- Length Check
- Lookup Check
- Uniqueness Check
- Consistency Check
What is Data Verification?
- Data verification is the process of ensuring that data entered into a system matches the original source accurately.
Methods of Data Verification:
- Double Entry Verification
- Visual Verification
- Checksum Verification
- Batch Totals
Database Schema and Data Types
Introduction to Data Types in Relational Databases
- Data types define the kind of data that can be stored in a particular column of a table.
- Choosing the correct data type is crucial for ensuring data integrity, optimizing storage, and enabling efficient data manipulation.
Different Data Types in Relational Databases
- Numeric Data Types
- INTEGER
- REAL or FLOAT
- DECIMAL
- String or Alphanumeric/Text Data Types
- Date and Time Data Types
- DATE
- TIME
- DATETIME or TIMESTAMP
- Boolean Data Type
The Importance of Data Type Consistency
- Consistent use of data types ensures that:
- Data is stored efficiently
- Operations perform correctly
- Application logic remains reliable
- Data validation is effective
The Potential Effects of Choosing the Wrong Data Type
- Data Loss
- Data Corruption
- Storage Inefficiency
- Performance Issues
- Application Errors
- Integrity Violations
- Data Definition
- Data Manipulation
- Data Control
- Data Integrity
- Transaction Management
- Recovery Management
Database Schema
- A database schema is a blueprint or structure that defines how data is organized within a database.
Conceptual Schema
- The conceptual schema is the high-level, abstract view of the entire database.
Logical Schema
- The logical schema provides a more detailed view than the conceptual schema.
Physical Schema
- The physical schema is the lowest level of abstraction, detailing how the data is physically stored on storage devices.
Data Dictionary
- A Data Dictionary is a structured repository that defines and describes all the data elements in a database.
Data Definition Language (DDL)
- Data Definition Language (DDL) is a subset of SQL used to define and modify the structure of database objects.
Importance of Data Modeling in the Design of a Database
- Data modeling is the process of creating a visual representation of a database and the relationships between its elements.
- Key Benefits of Data Modeling:
- Provides a Clear Structure
- Reduces Data Redundancy
- Improves Data Quality
- Enhances Performance
- Facilitates Communication
- Supports Scalability
- Enables Better Decision-Making
Database Security
Introduction to Database Security
- Database security is the process of protecting a database from unauthorized access, misuse, or malicious attacks.
Why Database Security is Important
- Data Confidentiality
- Data Integrity
- Data Availability
Different Methods to Ensure Database Security
- Access Control
- Authentication
- Role-Based Access Control (RBAC)
- Encryption
- Data at Rest Encryption
- Data in Transit Encryption
- Database Auditing and Monitoring
- Activity Monitoring
- Audit Trails
Database Locking
- Database locking is a mechanism used to control concurrent access to data in a database to ensure data integrity and consistency.
Database Locking Techniques
- Row-Level Locking
- Table-Level Locking
- Page-Level Locking
Other Locking Techniques
- Pessimistic Locking
- Optimistic Locking
Real-World Database Security Scenarios
- Banking System Transaction Security
- User Authentication and Access Rights
Security Best Practices
- Regular Security Audits
- Keep Software Updated
- Implement Strong Authentication
- Encrypt Sensitive Data
- Principle of Least Privilege
- Regular Backups
- Monitor Database Activity
- Security Training