DBMS - UNIT V_R22

UNIT – V: File Organization

File Organization: General Overview

  • File Organization: Refers to the logical relationships among records in a file, regarding identification and access methods.

  • What is a File?: A file is a collection of related information stored on secondary storage media (e.g., magnetic disks, tapes).

  • File Structure: The format of data and label blocks within a file.

    • Objectives:

      • Fast record selection.

      • Efficient insertion, deletion, and updates.

      • Prevents duplicate records.

      • Cost-effective data storage.

Types of File Organizations

  • Sequential File Organization: Files stored one after the other; can be implemented in two ways:

    • Pile File Method: Records are stored in the sequence they are inserted.

    • Sorted File Method: Records are inserted in sorted order (ascending or descending).

  • Heap File Organization: Stores records at the end of the file without sorting.

  • Hash File Organization: Uses a hash function to determine record storage.

  • B+ Tree File Organization: Advanced indexing method utilizing a tree structure.

  • Clustered File Organization: Groups related records/tables in the same file.

  • ISAM (Indexed Sequential Access Method): Combines sequential and indexed methods.

Sequential File Organization

Advantages
  • Efficient for large data sets.

  • Simple design.

  • Ideal for magnetic tape storage.

Disadvantages
  • Time-consuming to access specific records as it requires sequential scanning.

  • Inefficient for maintaining sorted order once records change.

Heap File Organization

  • Records are stored in blocks without ordering.

  • DBMS manages storage; searching involves traversing from the start.

Advantages
  • Faster fetching in small databases.

  • Effective for large data loads.

Disadvantages
  • Potential for unused memory blocks.

  • Less efficient for larger databases.

Hashing Techniques

  • Hash File Organization: Stores data in direct locations via hash functions for efficient retrieval.

    • Data Bucket: Storage location for records.

    • Hash Function: Maps search keys to addresses.

    • Hash Index: Uses the prefix of the hash value for addressing.

Hashing Types
  • Static Hashing: Fixed addresses for records based on hash function.

    • Buckets do not change; duplicates lead to bucket overflow, managed by open or close hashing strategies.

  • Dynamic Hashing: Addresses can change; buckets grow or shrink dynamically to manage data size efficiently.

B+ Tree File Organization

  • Uses a tree-like structure for indexing.

  • All records are stored at leaf nodes; internal nodes point to leaves.

Advantages
  • Efficient searching due to a balanced structure.

  • Supports range queries efficiently.

Clustered File Organization

  • Related tables are stored together to reduce retrieval cost.

  • Can be implemented via indexed or hash clusters.

ISAM

  • Combines sequential storage with an indexing system for fast access.

Data Dictionary Storage

  • Purpose: Stores metadata about database structures, helping in management and organization of data.

  • Components: Contains information about tables, indexes, views, and user permissions.

  • Access: Managed by DBMS and can be queried using SQL.

Indexing and Hashing

Basic Concepts

  • Indexing: Optimizes performance by minimizing disk accesses; indexes are data structures locating data quickly.

  • Hashing: Directly calculates address of a record without needing an index structure.

Ordered Indices

  • Maintains order of keys to facilitate fast access and support range queries.

    • Types: B-Tree, B+ Tree, Balanced Tree.

Comparison of Indexing and Hashing

  • Indexing: Efficient for various queries, especially range.

  • Hashing: Fast for exact match queries but not for ranges.

Conclusion

  • Understanding File Organization is key for optimizing data retrieval and ensuring efficiency in database management systems.