DBMS - UNIT V_R22
UNIT – V: File Organization
File Organization: General Overview
File Organization: Refers to the logical relationships among records in a file, regarding identification and access methods.
What is a File?: A file is a collection of related information stored on secondary storage media (e.g., magnetic disks, tapes).
File Structure: The format of data and label blocks within a file.
Objectives:
Fast record selection.
Efficient insertion, deletion, and updates.
Prevents duplicate records.
Cost-effective data storage.
Types of File Organizations
Sequential File Organization: Files stored one after the other; can be implemented in two ways:
Pile File Method: Records are stored in the sequence they are inserted.
Sorted File Method: Records are inserted in sorted order (ascending or descending).
Heap File Organization: Stores records at the end of the file without sorting.
Hash File Organization: Uses a hash function to determine record storage.
B+ Tree File Organization: Advanced indexing method utilizing a tree structure.
Clustered File Organization: Groups related records/tables in the same file.
ISAM (Indexed Sequential Access Method): Combines sequential and indexed methods.
Sequential File Organization
Advantages
Efficient for large data sets.
Simple design.
Ideal for magnetic tape storage.
Disadvantages
Time-consuming to access specific records as it requires sequential scanning.
Inefficient for maintaining sorted order once records change.
Heap File Organization
Records are stored in blocks without ordering.
DBMS manages storage; searching involves traversing from the start.
Advantages
Faster fetching in small databases.
Effective for large data loads.
Disadvantages
Potential for unused memory blocks.
Less efficient for larger databases.
Hashing Techniques
Hash File Organization: Stores data in direct locations via hash functions for efficient retrieval.
Data Bucket: Storage location for records.
Hash Function: Maps search keys to addresses.
Hash Index: Uses the prefix of the hash value for addressing.
Hashing Types
Static Hashing: Fixed addresses for records based on hash function.
Buckets do not change; duplicates lead to bucket overflow, managed by open or close hashing strategies.
Dynamic Hashing: Addresses can change; buckets grow or shrink dynamically to manage data size efficiently.
B+ Tree File Organization
Uses a tree-like structure for indexing.
All records are stored at leaf nodes; internal nodes point to leaves.
Advantages
Efficient searching due to a balanced structure.
Supports range queries efficiently.
Clustered File Organization
Related tables are stored together to reduce retrieval cost.
Can be implemented via indexed or hash clusters.
ISAM
Combines sequential storage with an indexing system for fast access.
Data Dictionary Storage
Purpose: Stores metadata about database structures, helping in management and organization of data.
Components: Contains information about tables, indexes, views, and user permissions.
Access: Managed by DBMS and can be queried using SQL.
Indexing and Hashing
Basic Concepts
Indexing: Optimizes performance by minimizing disk accesses; indexes are data structures locating data quickly.
Hashing: Directly calculates address of a record without needing an index structure.
Ordered Indices
Maintains order of keys to facilitate fast access and support range queries.
Types: B-Tree, B+ Tree, Balanced Tree.
Comparison of Indexing and Hashing
Indexing: Efficient for various queries, especially range.
Hashing: Fast for exact match queries but not for ranges.
Conclusion
Understanding File Organization is key for optimizing data retrieval and ensuring efficiency in database management systems.