Database Notes

Data and Databases

  • Data is a collection of distinct units of information in various forms (text, numbers, media). It originates from the word 'datum.'

  • A database is an organized collection of data, structured in tables, rows, and columns for easy access and management.

  • Modern databases are managed by DBMS, and SQL is used to operate on the stored data.

Evolution of Databases

  • File-Based (1968): Data was maintained in flat files with various access methods (sequential, indexed, random).

  • Hierarchical Data Model (1968-1980): Files are related in a parent/child manner; IBM's IMS was a prominent example. Limitations included complex implementation.

  • Network Data Model: Files are related as owners and members. Limitations included system complexity and difficulty in design and maintenance.

  • Relational Database (1970-Present):

    • Proposed by E.F. Codd.

    • Uses instances (tables with rows/columns) and schemas (structure definition).

    • Employs mathematical concepts like set theory and predicate logic.

  • Cloud Database:

    • Store and manage data on a cloud platform, accessible over the Internet.

    • Offered as Database as a Service (DBaaS).

    • Examples: AWS, Snowflake Computing, Oracle Database Cloud Services, Microsoft SQL Server, Google Cloud Spanner.

NoSQL Database

  • NoSQL stands for "not only SQL".

  • Accommodates a variety of data models, useful for large, distributed datasets.

  • Examples: MongoDB, CouchDB (Document-based); Memcached, Redis (Key-value store); HBase, Big Table (Tabular).

Object-Oriented Databases

  • Contain data in the form of objects and classes, combining relational model features with object-oriented principles.

  • Properties:

    • Object-oriented programming properties: Objects, Classes, Inheritance, Polymorphism, Encapsulation

    • Relational database properties: Atomicity, Consistency, Integrity, Durability, Concurrency, Query processing

Graph Databases

  • A NoSQL database using a graphical representation of data (nodes and edges).

  • Beneficial for searching relationships between data in complex and dynamic schemas.

DBMS (Database Management System)

  • Software used to store and retrieve databases. Examples: Oracle, MySQL.

  • Provides an interface for operations like creation, deletion, modification.

  • Advantages: Controls redundancy, enables data sharing, provides backup and recovery.

  • Disadvantages: Size, Cost, Complexity

RDBMS (Relational Database Management System)

  • Based on the relational model introduced by E.F. Codd.

  • Components: Table, Record/Tuple, Field/Column, Instance, Schema, Keys.

  • Maintains data security, integrity, accuracy, and consistency.

Types of Databases

  • Centralized Database: Stores data at a centralized system, accessible from different locations.

  • Distributed Database: Data is distributed among different systems, connected via communication links.

    • Homogeneous DDB: Same OS, application, and hardware.

    • Heterogeneous DDB: Different OS, application, and hardware.

  • Relational Database: Based on the relational data model (rows and columns).

    • Properties: ACID (Atomicity, Consistency, Isolation, Durability).

  • NoSQL Database: Stores a wide range of data sets, not limited to tabular form.

    • Types: Key-value storage, Document-oriented, Graph Databases, Wide-column stores.

  • Cloud Database: Stored in a virtual environment and executes over a cloud platform.

  • Object-oriented Databases: Uses the object-based data model.

  • Hierarchical Databases: Stores data in a parent-children relationship (tree-like structure).

  • Network Databases: Data is represented as nodes connected via links.

  • Personal Database: Data stored on a user's system, designed for a single user.

  • Operational Database: Creates and updates the database in real-time.

  • Enterprise Database: Used by large organizations for managing massive amounts of data.

RDBMS Details

  • Data is represented in tuples (rows) within tables. Each table has a primary key.

  • Table/Relation: Stores data in rows and columns.

    • Properties: Unique name, no duplicate tuples, no specific order, atomic attributes.

  • Row/Record/Tuple: Contains specific information for each entry.

    • Properties: No identical tuples, same format and number of entries, order is irrelevant.

  • Column/Attribute: Contains information associated with a specific field.

    • Properties: Every attribute has a name, null values are permitted, default values can be specified, primary key.

  • Data Item/Cells: Smallest unit of data.

    • Properties: Atomic, drawn from the same domain.

  • Degree: Total number of attributes.

  • Cardinality: Total number of tuples.

  • Domain: Possible values each attribute can contain.

  • NULL Values: Indicates a blank field.

  • Data Integrity:

    • Entity integrity: No duplicate rows.

    • Domain integrity: Valid entries for a column.

    • Referential integrity: Prevents deletion of used rows.

    • User-defined integrity: Specific business rules.

DBMS vs RDBMS

  • DBMS stores data as files; RDBMS stores in tabular form.

  • RDBMS includes normalization and ACID properties.

  • RDBMS supports distributed databases and multiple users, designed for large data.

  • RDBMS is an extension of DBMS; many software products are compatible with both.

Terminology

  • Degree: The total number of attributes that comprise a relation

  • Cardinality: The total number of tuples in a relation at any one time