Database Architecture Notes

Database Architecture

The Three-Level ANSI-SPARC Architecture

  • In 1978, the ANSI/SPARC committee proposed a generalized framework for database systems, distinguishing between the logical and physical representation of data.
  • This framework introduced a three-level architecture, representing three levels of abstraction for viewing the database.

Need for Abstraction

  • The primary objective of the three-level architecture is to separate each user’s view of the database from its physical representation.
  • Reasons for this separation:
    • Users should be able to access the same data but have different customized views.
    • Users should be able to change their view of the data without affecting other users.
    • Users should not have to deal directly with physical database storage details; interaction should be independent of storage considerations.
    • The Database Administrator (DBA) should be able to change database storage structures without affecting users’ views.
    • Changes to physical storage aspects (e.g., new storage device) should not affect the internal structure of the database.
    • The DBA should be able to change the conceptual structure without affecting all users.

External Level

  • Represents the users’ view of the database.
  • Describes the part of the database relevant to each user.
  • Consists of various external views; each user has a view of the ‘real world’ in a familiar format.
  • Includes only entities, attributes, and relationships of interest to the user.
  • Different views can have different representations of the same data.
  • May include derived or calculated data, which is not stored but created when needed.

Conceptual Level

  • Represents the community view of the database.
  • Describes what data is stored and the relationships among the data.
  • The middle level in the three-level architecture.
  • Contains the logical structure of the entire database, as seen by the DBA.
  • Provides a complete view of the organization's data requirements, independent of storage considerations.
  • The conceptual level represents:
    • All entities, their attributes, and relationships.
    • Constraints on the data.
    • Semantic information about the data.
    • Security and integrity information.
  • Supports each external view; any data available to a user must be contained in or derivable from this level.
  • Must not contain any storage-dependent details.

Internal Level

  • Represents the physical representation of the database on the computer.
  • Describes how the data is stored in the database.
  • Covers the physical implementation of the database to achieve optimal runtime performance and storage space utilization.
  • Covers the data structures and file organizations used to store data on storage devices.
  • Interfaces with the operating system access methods (file management techniques) to place and retrieve data.
  • Concerned with:
    • Storage space allocation for data and indexes.
    • Record descriptions for storage (with stored sizes for data items).
    • Record placement.
    • Data compression and data encryption techniques.

Data Independence

  • A major objective of the three-level architecture is data independence, meaning upper levels are unaffected by changes to lower levels.
  • Two kinds of data independence:
    • Logical data independence
    • Physical data independence
Logical Data Independence
  • Refers to the immunity of external schemas to changes in the conceptual schema.
  • Changes to the conceptual schema (e.g., addition/removal of entities, attributes, or relationships) should be possible without changing existing external schemas or rewriting application programs.
  • Users for whom changes are made need to be aware, but other users should not be affected.
Physical Data Independence
  • Refers to the immunity of the conceptual schema to changes in the internal schema.
  • Changes to the internal schema (e.g., different file organizations, storage structures, devices, modifying indexes, or hashing algorithms) should be possible without changing the conceptual or external schemas.
  • From the users’ viewpoint, the only noticeable effect may be a change in performance.

Data Models

  • A model is a representation of ‘real world’ objects, events, and their associations.
  • A data model represents the organization itself.
  • Should provide basic concepts and notations for database designers and end-users to communicate their understanding of organizational data.
  • Components:
    • A structural part, consisting of a set of rules for database construction.
    • A manipulative part, defining allowed operations on the data (updating, retrieving, changing structure).
    • A set of integrity constraints to ensure data accuracy.
  • Data models fall into three broad categories:
    • Object-based
    • Record-based
    • Physical data models
  • The first two are used at the conceptual and external levels; the latter is used at the internal level.

Object-Based Logical Model

  • Contains concepts such as entities, attributes, and relationships.
  • First developed by Peter Chen in 1976.
  • Common types:
    • Entity–Relationship
    • Functional
    • Object-Oriented
  • An entity is a distinct object (person, place, thing, concept, event) in the organization to be represented in the database.
  • An attribute is a property describing some aspect of the object we wish to record.
  • A relationship is an association between entities.
  • The object-oriented data model extends the definition of an entity to include both attributes (state) and actions (behavior).
  • The object encapsulates both state and behavior.
  • Object-based logical models are then transformed into record-based logical models.

Record-Based Model

  • Record-Based Model or Logical Data Model or simply Database Model is logical conversion of Entity Relationship Model

  • Visualizes the logical model, which in turn is the basis for the physical implementation (database tables, indexes, etc.).

  • In a record-based model, the database consists of fixed-format records, possibly of differing types.

  • Three principal types:

    • Relational data model
    • Network data model
    • Hierarchical data model
  • Choice of data model depends on the relationship between various pieces of data

  • Each model is best suited for a particular class of problems depending on:

    • Nature of the data relationships
    • The need for flexibility
    • The volume of changes to the database
  • The hierarchical and network data models were developed before the relational data model, so their links to traditional file processing concepts are more evident.

Hierarchical Model
  • The hierarchical DBMS is used to model one-to-many relationships, presenting data in a treelike structure.
  • Within each record, data elements are organized into segments.
  • To the user, each record looks like an organizational chart with one top-level segment called the root.
  • An upper segment is connected logically to a lower segment in a parent–child relationship.
  • A parent segment can have more than one child, but a child can have only one parent.
  • Hierarchical structures depict one-to-many relationships.
Network Model
  • In the network model, data is represented as collections of records, and relationships are represented by sets.
  • In network model parents can have multiple children, and a child can have more than one parent.
  • Network DBMS depict data logically as many-to-many relationships.
  • A 1:N, or one-to-many, relationship relates one instance of a record to many record instances using some pointer linking mechanism in these model.
Relational Database Model
  • The relational data model is based on the concept of mathematical relations.
  • In the relational model, data and relationships are represented as tables, each of which has a number of columns with a unique name.
  • Relational databases permit more flexible combination and easier retrieval of information than other models.
  • Data in one table can be related to any piece of data in another table as long as both tables share a common element.

Physical Database Model

  • Physical data models describe how data is stored in the computer, representing information such as record structures, record orderings, and access paths.
  • Its just Data Definition Language(DDL)
  • DDL provides the semantics for administering all the physical objects In your database that is stored in the data dictionary
  • The result of compilation of DDL Operation is a set of tables, views, roles, users and other objects as physical database