Chapter 1 DBMS

What is a Database?

A database is a collection of inter-related data organized for efficient retrieval, insertion, and deletion. It helps store data systematically, using structures like tables, schemas, views, and reports.

Example:

A college database may store information about various entities such as admin, staff, students, and faculty.

Database Management System (DBMS)

A DBMS is software designed to manage databases efficiently. It provides an interface to perform various operations such as:

Creating databases
Storing data
Updating data
Creating tables
Ensuring database security

Example of DBMS software:

MySQL
Oracle

Needs for DBMS

Traditional file-processing systems had several limitations, which led to the development of DBMS. Some of the key problems DBMS solves include:

Data Redundancy and Inconsistency: Elimination of duplicate data.
Difficulty in Accessing Data: Simplifies the retrieval of data.
Data Isolation: Overcomes challenges of multiple files and formats.
Integrity Problems: Ensures data correctness.
Atomicity of Updates: Ensures that updates are completed fully or not at all.
Concurrent Access by Multiple Users: Manages multiple users accessing data simultaneously.
Security Problems: Enhances data security and privacy.

Tasks Performed by DBMS

Data Definition:
- Creating, modifying, and removing structures in the database.
Data Updation:
- Inserting, modifying, and deleting records in the database.
Data Retrieval:
- Querying the database to extract information.
User Administration:
- Managing users, maintaining data integrity, ensuring security, and recovering corrupted data.

Characteristics of a DBMS (Database Management System)

Digital Repository: A DBMS uses a digital repository established on a server to store and manage information efficiently.
Logical View: It provides a clear and logical view of the processes that manipulate the data.
Automatic Backup and Recovery: A DBMS includes automatic backup and recovery mechanisms, ensuring data safety.
ACID Properties: The system adheres to ACID properties (Atomicity, Consistency, Isolation, Durability), ensuring data integrity even in the event of failure.
Data Relationship Management: It reduces the complexity of relationships between different datasets.
Data Manipulation: The DBMS supports data manipulation and processing efficiently.
Data Security: It provides robust data security to protect information from unauthorized access.

Advantages of a DBMS

Controls Data Redundancy: The system helps in controlling data redundancy by storing all data in a single database file, eliminating duplicate records.
Data Sharing: Authorized users can share data across multiple users in an organization.
Ease of Maintenance: The centralized nature of the database system makes it easy to maintain.
Time Reduction: It reduces development and maintenance time, improving efficiency.
Backup and Recovery: Automatic backup and recovery subsystems protect data from hardware or software failures, restoring it when needed.
Multiple User Interfaces: The system supports various user interfaces such as graphical user interfaces (GUIs) and application programming interfaces (APIs).

Disadvantages of a DBMS

Cost of Hardware and Software: A DBMS requires high-speed processors and large memory capacities, which can be expensive.
Large Size: The system occupies significant disk space and memory to run efficiently.
Complexity: DBMSs can introduce complexity, with additional requirements and configuration steps.
Higher Impact of Failure: If the database is damaged, such as through an electric failure or corruption, all stored data could be lost, especially in centralized systems.

ACID Properties of a DBMS

To maintain consistency, DBMS transactions follow ACID properties:

Atomicity: Ensures that a transaction is executed fully or not at all.
Consistency: Guarantees that the database remains in a consistent state before and after any transaction.
Isolation: Multiple transactions can occur simultaneously without leading to inconsistencies.
Durability: Ensures that the changes made by a successful transaction are permanently written to the disk.

What is a File Processing System?

A File Processing System is a method of organizing and storing files on a storage medium like a hard disk. It helps in managing and retrieving files when required. File processing systems typically consist of multiple files that are grouped into directories, which may contain other folders and files. These systems perform basic operations such as:

File management
File naming
Defining access rules

Limitations of a File Processing System

Data Redundancy: Data duplication often occurs as files are stored separately in different locations, leading to inefficiency.
Lack of Data Integrity: Ensuring consistency across various files can be difficult, leading to less reliable data.
Limited Data Sharing: Only one user can access the data at a time, making it hard for multiple users to collaborate.
No Backup and Recovery: File systems generally do not have automatic backup or recovery features, meaning that data can be lost permanently in case of failures.
Inconsistent Data: There is no mechanism to ensure consistent data across multiple files, making the data prone to errors.
No Efficient Query Processing: File systems lack a query processing feature, making data retrieval slow and inefficient.
Low Security: Security constraints in file systems are minimal, making sensitive data more vulnerable to unauthorized access.
Limited Scalability: As data grows, file systems become inefficient in handling larger datasets.
No Data Abstraction: File systems expose details of data storage and structure, making it harder to manage.

Difference Between File Processing System and DBMS

		AspectFile Processing SystemDBMS
Structure	Organizes files in storage mediums.	Manages a database using specialized software.
Data Redundancy	Redundant data is common.	Minimizes or eliminates data redundancy.
Backup and Recovery	No built-in backup or recovery mechanisms.	Provides automatic backup and recovery features.
Query Processing	Inefficient or no query processing.	Efficient query processing.
Consistency	Less data consistency.	Ensures data consistency through normalization.
Complexity	Less complex.	More complex due to advanced features.
Security Constraints	Provides minimal security.	Offers robust security mechanisms.
Cost	Less expensive.	More expensive due to advanced capabilities.
Data Independence	No data independence.	Data independence exists.
User Access	Only one user can access the data at a time.	Allows multiple users to access data simultaneously.
Sharing	Data sharing is difficult.	Centralized nature makes data sharing easier.
Data Abstraction	Exposes data storage details.	Hides internal storage details.
Integrity Constraints	Hard to implement integrity constraints.	Integrity constraints are easy to implement.

ANSI/SPARC Model

Definition: The ANSI-SPARC Architecture (American National Standards Institute, Standards Planning and Requirements Committee) is an abstract design standard for database management systems. This architecture helps simplify database access by separating it into three layers:

Internal Level: Deals with physical storage details.
Conceptual Level: Represents the logical structure of the database.
External Level: Concerns user-specific views of the database.

The ANSI-SPARC model forms the foundation for most modern databases, allowing for easier data management and interaction across different complexity levels.

External Level

The External Level is how individual users or user groups view the database.
At this level, only the data relevant to each user is presented.
It consists of multiple external views, each tailored to the specific needs of the users.
The external view includes only the entities, attributes, and relationships that the user is interested in.
Different users may have different representations of the same data.
- For example, one user might see names as (firstname, lastname), while another might view them as (lastname, firstname).

Conceptual Level

The Conceptual Level is the middle layer of the three-level architecture.
It provides a community view of the entire database, describing what data is stored, including entities, their attributes, and relationships.
This level ensures semantic consistency, defines security rules, and enforces integrity constraints.
It represents the logical structure of the entire database, independent of how the data is physically stored.
The conceptual level offers a complete view of the database that meets the organization’s requirements.

Internal Level

The Internal Level deals with the physical storage of the database on the computer.
It focuses on the physical implementation, optimizing storage space utilization, runtime performance, and ensuring data security through encryption.
This level interacts with the operating system to handle tasks like placing data in storage files, managing storage space, and retrieving data efficiently.

What is Data Independence?

Definition: Data independence is the ability to modify the schema at one level of a database system without altering the schema at the next higher level. This concept helps in maintaining the flexibility and adaptability of the database.

There are two types of data independence:

Logical Data Independence
Physical Data Independence

1. Logical Data Independence

Definition: Logical data independence is the ability to change the conceptual schema without affecting the external schema or the user’s view.
It separates the external level from the conceptual view.
Any changes made to the conceptual schema (such as adding or removing fields or tables) do not affect how users interact with the data.

2. Physical Data Independence

Definition: Physical data independence refers to the ability to change the internal schema (such as modifying storage structures) without altering the conceptual schema.
It separates the conceptual level from the internal level.
Changes to the physical storage of data (e.g., changes in file structure or storage size) do not impact the logical structure of the database.
Physical data independence happens at the physical interface level.

Client-Server Architecture

The DBMS design relies on its architecture to handle a large number of users and components, like PCs, web servers, and database servers, which are connected via a network.

Client/Server Architecture: This model connects multiple PCs and workstations via a network. The database server provides services, while clients request those services.

1-Tier Architecture

In this architecture, the database is directly available to the user.
Users can interact directly with the DBMS, making changes that impact the database immediately.
It is typically used for local application development, where developers can communicate directly with the database for quick responses.
There are no additional layers or tools for end users.

2-Tier Architecture

This architecture follows a basic client-server model.
Applications on the client side interact directly with the database on the server side using APIs such as ODBC or JDBC.
User interfaces and application programs run on the client side, while the server side handles query processing and transaction management.
The client-side application establishes a connection to the server-side database for communication.

3-Tier Architecture

This architecture introduces a middle layer between the client and server.
Clients cannot directly communicate with the database server. Instead, they interact with an application server, which communicates with the database system.
End users are unaware of the database's existence beyond the application server, and the database does not recognize individual users beyond the application.
The 3-Tier architecture is commonly used for large web applications that require an additional layer of interaction for security and scalability.

Database Administrator (DBA)

The Database Administrator (DBA) manages database access, security, and performance.
They create new accounts and passwords for users who need to access the database.
Security: The DBA ensures that only authorized users can access or modify the database.
Problem Resolution: Responsible for addressing issues like security breaches and poor system response times.
Backup and Recovery: The DBA monitors and manages database recovery and backup processes, ensuring data is not lost during system failures.
Super User Account: The DBA holds a special account called a system or super user account with elevated privileges.
DCL Operations: The DBA can perform Data Control Language (DCL) operations such as GRANT and REVOKE, allowing or restricting specific users' access to the database.
They handle damage control caused by hardware or software failures.

Database Users

Database users can be divided into five groups based on how they interact with the database:

Naive Users
Application Programmers
Sophisticated Users
Specialized Users
Stand-alone Users

1. Naive / Parametric End Users

Naive Users are unsophisticated users with no DBMS knowledge but who use database applications regularly.
Examples include railway ticket booking agents or clerks in a bank. They rely on pre-built database applications to perform their tasks.
A System Analyst ensures that the needs of these users are met by analyzing their requirements.

2. Sophisticated Users

These users include engineers, scientists, and business analysts who are familiar with databases but do not write application code.
They interact with the database by writing SQL queries directly to retrieve or modify data as needed.

3. Database Designers

Database Designers create the structure of the database, including tables, indexes, views, triggers, stored procedures, and constraints.
They are responsible for determining what data needs to be stored and how it should be related.
Designers work closely with various user groups to ensure that the database structure satisfies the requirements of all stakeholders.

4. Application Programmers

Also known as System Analysts or Software Engineers, Application Programmers write the code that allows users to interact with the database.
They use programming languages such as Visual Basic, Developer, C, FORTRAN, and COBOL to design, debug, test, and maintain applications.
Application programmers often create "canned transactions" for naive users, enabling them to interact with the database without needing any database knowledge.

5. Specialized Users

Specialized Users are typically sophisticated users who write advanced database applications that go beyond traditional data-processing tasks.
These applications include systems like Computer-Aided Design (CAD), knowledge-based systems, and expert systems, which require specialized handling.