Database Notes

ECII/ECSI/ECCI 1206 DATABASE SYSTEMS PRINCIPLES

Database Management Systems (DBMS)

  • Definition:
      - A Database Management System (DBMS) is the technology for efficiently storing and retrieving users' data with appropriate security measures.
  • Tutorial Content:
      - Basics of DBMS:
        - Architecture
        - Data models
        - Data schemas
        - Data independence
        - E-R model
        - Relational model
        - Relational database design
        - Storage and file structure

Programming Language Generations

  1. First Generation Language (1GL):
       - Machine language directly understood by processors, represented in binary (0s and 1s).
       - Example: Instructions as strings of 0s and 1s.

  2. Second Generation Language (2GL):
       - Assembly language, a type of low-level programming language that directly maps to machine instructions.
       - Example: ADD 12, 8
       - Assembler translates assembly language into machine language.

  3. Third Generation Language (3GL):
       - High-level programming languages, more abstract and closer to natural language.
       - Examples include PL/I, C, Java.
       - Java produces bytecode which is executed by the Java Virtual Machine (JVM).
       - Requires significant programming knowledge.

  4. Fourth Generation Language (4GL):
       - Closer to natural language and designed for database access.
       - Example: EXTRACT ALL CUSTOMERS WHERE "PREVIOUS PURCHASES" TOTAL MORE THAN $1000
       - Often includes SQL for database operations.

  5. Fifth Generation Language (5GL):
       - Visual or graphical programming languages for developing applications, often compiled with 3GL or 4GL.
       - Example: Development tools from companies like Microsoft and IBM.

Introduction to 4GLs Programming

  • Characteristics:
      - Approximates natural language for ease of communication with computers.
      - Compilation and interpretation required to convert into machine code.
  • SQL Development:
      - Developed by IBM for managing information in relational databases.
      - Adopted standards by ANSI and ISO.
      - Supports data independence and compatibility across systems.

What is a Database (DB)?

  • Definition:
      - A collection of related data that persists over time, typically managed using a DBMS.
  • Key Features:
      - Permanently stored, secured (passwords, encryption, etc.), expandable, and facilitates easier data retrieval and modification.

Database Advantages & Disadvantages

Advantages:
  • Reduced data redundancy.
  • Lower updating errors and increased consistency.
  • Enhanced data integrity and application independence.
  • Improved data access and security.
  • Decreased costs in data entry, storage, and retrieval.
  • Easier application development.
Disadvantages:
  • Complexity in design and required time.
  • High initial hardware and software costs.
  • Potential damage affecting multiple applications.
  • Extensive costs incurred during migration from file-based systems.
  • Requirement of initial training for users and programmers.

ACID Properties in Databases

  • ACID: A set of properties that ensure reliable database transactions.
      1. Atomicity: Transactions are all-or-nothing. No partial transactions exist.
      2. Consistency: Database remains in a consistent state post-transaction.
      3. Isolation: Simultaneous transactions do not impact each other’s execution.
      4. Durability: Completed transactions persist, even in failure scenarios.

Uses of Databases

  • Business: Inventory, payroll, order processing, shipping, accounts.
  • Education: Student records, grades, transfers, transcripts.
  • Non-Profit Organizations: Tracking donations, community service hours, clients served.
  • Personal: Managing household information such as expenses and relationships.

Types of Databases

  1. Text database: Simple file format with rows and columns.
  2. Desktop database: More complex than text databases (e.g., Microsoft Access).
  3. Relational databases (RDBMS): Store data in tables with relationships (e.g., SQL Server, MySQL).
  4. NoSQL and Object-oriented databases: Newer models that do not use the traditional table format.

Features of a Database

  1. Data Abstraction: Hides details about how data is stored.
       - Physical, logical, and view levels.
  2. Instances and Schemas: Current data snapshot vs. overall database structure.
  3. Data Independence: Changes can be made at one schema level without affecting others.
       - Physical data independence and logical data independence.

Characteristics of DBMS

  • User-Friendly: Multiple views for different user roles.
  • Security: User access control, data encryption.
  • Data Administration: Structuring and enforcing access protocols.
  • Concurrency Support: Multi-user access.

Overview of Database Architecture and Design

  1. Determining Data Storage: Understanding what data needs to be kept and its relationships.
  2. Database Structure and Configuration: Logical design and physical schema configurations.
ER Model: Concepts
  • Entities and Attributes: Core components of the database model, including essential characteristics and relationships.
  • Relations: How entities interact within the confines of a database schema.

Codd's 12 Rules for Relational Databases

  1. All data should be stored in tables.
  2. Data can be accessed using logical operations.
  3. NULLs must be treated systematically.
  4. Active online catalog access.
  5. A comprehensive data sub-language must exist.
  6. All views should be updatable.
  7. Supports high-level operations (insert, update, delete).
  8. Data independence must be maintained.
  9. Logical data independence should be enforced.
  10. Integrity should be independent of applications.
  11. Distribution independence must be upheld.
  12. The system must not allow security bypassing through low-level access.

Database Normalization Process

  • A systematic method of organizing data to reduce redundancy and enhance integrity.
Normal Forms
  1. 1NF (First Normal Form): All entries are atomic.
  2. 2NF (Second Normal Form): Eliminate partial dependencies.
  3. 3NF (Third Normal Form): Remove transitive dependencies.
  4. Boyce-Codd Normal Form (BCNF): Advanced form of 3NF.
  5. 4NF and 5NF: Address multi-valued dependencies.

Conclusion

  • The DBMS landscape is rapidly evolving with new trends, including cloud computing, AI integrations, NoSQL databases, and data privacy regulations guiding contemporary data management practices. Understanding and applying these principles ensure secure, efficient, and effective database systems.