Overview of database systems and their significance in organizing data.
Textbook: Database Systems Concepts Design Applications by S.K. Singh, latest edition (2011)
References:
Database System by Catherine Ricardo
Data Management Systems by Raghu Ramakrishnan and Johannes Gehrke
Database Systems: Design, Implementation, and Management by Carlos Coronel, Steven Morris, Peter Rob (10th edition, 2012)
An Introduction to Database Systems by Date (2006)
Introduction to ORACLE: SQL and PL/SQL, Student Guide.
Basics:
Introduction to Database Systems
The Entity Relationship Model
The Relational Model
Relational Algebra and Calculus
SQL:
Queries, Programming, Triggers, Query By Example (QBE)
Data Storage and Indexing:
File Organization and Indexes
Tree-Structured Indexing
Hash-Based Indexing
Query Evaluation:
External Sorting
Evaluation Of Relational Operators
Introduction to Query Optimization
A Typical Relational Query Optimizer
Database Design:
Schema Refinement and Normal Forms
Physical Database Design and Tuning
Security
Transaction Management:
Overview of Transaction Management
Concurrency Control
Crash Recovery
Advanced Topics:
Parallel and Distributed Databases
Internet Databases
Decision Support
Data Mining
Object Database Systems
Spatial Data Management
Deductive Databases
Definition: Data is the lowest level of abstraction from which information and knowledge are derived.
Forms of Data:
Numbers or text on paper
Bits and bytes stored in electronic memory
Facts stored in a person's mind.
Raw Data: A collection of unprocessed numbers, characters, or images from information-gathering devices.
Examples of Data Uses: Employee data for payroll calculations, communications, emergency information.
Sources of Data:
Memories
Digital folders
Spreadsheets
Paper piles
Filing cabinets
Other miscellaneous sources.
Definition: A file system organizes and stores data for easy retrieval and manipulation by a computer's operating system.
Storage Devices: Includes hard disks and CD-ROMs enabling physical file location maintenance.
Characteristics:
ASCII files with accounts separated by new lines.
Fields may be separated by specific symbols (e.g., #).
A simple script can be implemented to:
Scan through the accounts file
Search for a specific name (e.g., "Homer Simpson")
Print corresponding balance.
Historical perspective on data handling complexity.
Enhanced understanding of design complexity, helping avoid past issues in database system design.
Data Redundancy:
Decentralized approaches leading to data duplication.
Increased storage space and operational inefficiency.
Data Inconsistency:
Multiple file formats leading to varying data quality.
Difficulty Accessing Data:
Need for new programs for different tasks.
Data Isolation:
Spread across various files, making retrieval challenging.
Poor Data Control:
Multiple naming conventions leading to confusion.
Limited Data Sharing:
Private files dividing data access.
Inadequate Data Manipulation:
Limited interaction across different files.
Integrity Problems:
Structural changes requiring updates across numerous programs.
Atomicity and Durability Issues:
Inconsistent state post failures.
Definition: An organized collection of data used for multiple applications.
Characteristics: Related information stored together to meet organizational needs.
Variability in Size and Complexity: From small lists to extensive tax databases.
Importance across various sectors: banking, airline reservations, universities, sales, manufacturing, and human resources.
Approach that simplifies access to information stored in databases.
Definition: Software programs that enable users to create, maintain, and utilize databases.
Functions Include: Defining data types and structures, storing data, manipulating and querying for specific data.
Major companies and their products:
Oracle (Oracle 8i, 9i, ... 11i)
IBM (DB2, Universal Server)
Microsoft (Access, SQL Server)
Teradata, PostgreSQL, and others.
Minimal Data Redundancy: Centralized control reduces duplication.
Program Data Independence: Changes in data structure do not affect application programs.
Efficient Data Access: Utilizes techniques for faster data retrieval.
Improved Data Sharing: Central databases allow shared access among users.
Data Integrity and Security: Ensures accurate data entry and access control.
Enhanced Productivity: Standard functions automate database activities.
Levels of Abstraction:
Physical Level: Details how data is stored.
Logical Level: Structure and relationships of stored data.
View Level: Hides details, showing only part of the database.
Complexity: Multi-user systems may complicate design and management.
Need for Specialized Manpower: Continuous training required for database staff.
Cost of Management and Installation: Significant resources required for training, upgrades, and installations.
Backup & Recovery Needs: Must have procedures in place for data recovery after loss.
Data Administrator: Responsible for overall data control and database policies.
Database Administrator (DBA): Oversees technical aspects and user access to the database.
Includes schema definition, user authorization, physical organization modification, routine maintenance, and monitoring jobs.
Application Programmers: Develop applications for DB interactions.
Sophisticated Users: Directly formulate queries.
End Users: Use applications through GUIs.
Central repository for shared data managed by the DBMS to enhance consistency and control.
Example: ERP for integrating enterprise functions across departments.
SDLC (System Development Life Cycle): Detailed but slower development process.
Prototyping: Rapid iterative development for defining databases.
Information: Processed data that aids in decision-making.
Metadata: Data about the data, providing description and access assistance.
Data Items: Smallest data units with meaning.
Records: Grouping of related data items.
Files: Collections of related records, which can be of fixed or variable lengths.
Mini DBMS that manages metadata, aiding DBAs in maintaining and deriving information about databases.
Includes entities, attributes, relationships, and keys that define and structure the database.
Definition: A collection of operations for logical functions within a database.
ACID Properties:
Atomicity: All operations occur or none.
Consistency: Ensuring correct data transformations.
Isolation: Independence from other transactions.
Durability: Persistence of changes post-crash.