Database Notes
Data and Databases
Data is a collection of distinct units of information in various forms (text, numbers, media). It originates from the word 'datum.'
A database is an organized collection of data, structured in tables, rows, and columns for easy access and management.
Modern databases are managed by DBMS, and SQL is used to operate on the stored data.
Evolution of Databases
File-Based (1968): Data was maintained in flat files with various access methods (sequential, indexed, random).
Hierarchical Data Model (1968-1980): Files are related in a parent/child manner; IBM's IMS was a prominent example. Limitations included complex implementation.
Network Data Model: Files are related as owners and members. Limitations included system complexity and difficulty in design and maintenance.
Relational Database (1970-Present):
Proposed by E.F. Codd.
Uses instances (tables with rows/columns) and schemas (structure definition).
Employs mathematical concepts like set theory and predicate logic.
Cloud Database:
Store and manage data on a cloud platform, accessible over the Internet.
Offered as Database as a Service (DBaaS).
Examples: AWS, Snowflake Computing, Oracle Database Cloud Services, Microsoft SQL Server, Google Cloud Spanner.
NoSQL Database
NoSQL stands for "not only SQL".
Accommodates a variety of data models, useful for large, distributed datasets.
Examples: MongoDB, CouchDB (Document-based); Memcached, Redis (Key-value store); HBase, Big Table (Tabular).
Object-Oriented Databases
Contain data in the form of objects and classes, combining relational model features with object-oriented principles.
Properties:
Object-oriented programming properties: Objects, Classes, Inheritance, Polymorphism, Encapsulation
Relational database properties: Atomicity, Consistency, Integrity, Durability, Concurrency, Query processing
Graph Databases
A NoSQL database using a graphical representation of data (nodes and edges).
Beneficial for searching relationships between data in complex and dynamic schemas.
DBMS (Database Management System)
Software used to store and retrieve databases. Examples: Oracle, MySQL.
Provides an interface for operations like creation, deletion, modification.
Advantages: Controls redundancy, enables data sharing, provides backup and recovery.
Disadvantages: Size, Cost, Complexity
RDBMS (Relational Database Management System)
Based on the relational model introduced by E.F. Codd.
Components: Table, Record/Tuple, Field/Column, Instance, Schema, Keys.
Maintains data security, integrity, accuracy, and consistency.
Types of Databases
Centralized Database: Stores data at a centralized system, accessible from different locations.
Distributed Database: Data is distributed among different systems, connected via communication links.
Homogeneous DDB: Same OS, application, and hardware.
Heterogeneous DDB: Different OS, application, and hardware.
Relational Database: Based on the relational data model (rows and columns).
Properties: ACID (Atomicity, Consistency, Isolation, Durability).
NoSQL Database: Stores a wide range of data sets, not limited to tabular form.
Types: Key-value storage, Document-oriented, Graph Databases, Wide-column stores.
Cloud Database: Stored in a virtual environment and executes over a cloud platform.
Object-oriented Databases: Uses the object-based data model.
Hierarchical Databases: Stores data in a parent-children relationship (tree-like structure).
Network Databases: Data is represented as nodes connected via links.
Personal Database: Data stored on a user's system, designed for a single user.
Operational Database: Creates and updates the database in real-time.
Enterprise Database: Used by large organizations for managing massive amounts of data.
RDBMS Details
Data is represented in tuples (rows) within tables. Each table has a primary key.
Table/Relation: Stores data in rows and columns.
Properties: Unique name, no duplicate tuples, no specific order, atomic attributes.
Row/Record/Tuple: Contains specific information for each entry.
Properties: No identical tuples, same format and number of entries, order is irrelevant.
Column/Attribute: Contains information associated with a specific field.
Properties: Every attribute has a name, null values are permitted, default values can be specified, primary key.
Data Item/Cells: Smallest unit of data.
Properties: Atomic, drawn from the same domain.
Degree: Total number of attributes.
Cardinality: Total number of tuples.
Domain: Possible values each attribute can contain.
NULL Values: Indicates a blank field.
Data Integrity:
Entity integrity: No duplicate rows.
Domain integrity: Valid entries for a column.
Referential integrity: Prevents deletion of used rows.
User-defined integrity: Specific business rules.
DBMS vs RDBMS
DBMS stores data as files; RDBMS stores in tabular form.
RDBMS includes normalization and ACID properties.
RDBMS supports distributed databases and multiple users, designed for large data.
RDBMS is an extension of DBMS; many software products are compatible with both.
Terminology
Degree: The total number of attributes that comprise a relation
Cardinality: The total number of tuples in a relation at any one time