Lesson1-Intro

INTRODUCTION TO DATABASES LESSON 1

Introduction to Databases

  • Key Points of Discussion

    • Information management

    • Databases

    • Data model

    • Design of a database

    • Data independence

    • Data access

    • Advantages and disadvantages of a DBMS

Information Management

Overview

  • Information is recorded and exchanged in various forms.

  • The need for organizing and codifying information has led to different methods over time.

Computer Systems

  • Information Representation

    • Information is represented by data in computer systems.

    • Data are raw symbols needing interpretation for meaning.

  • Example

    • Data: "Laura Smith" and 424242

    • Information: Result of looking up a telephone number in a personal directory.

Data Characteristics

  • Data remain stable over time, unlike the processes managing them.

  • Example

    • Minimal variations in the data structures of bank applications over decades, while management procedures vary annually.

  • Data is a crucial resource for organizations managing them.

Databases

General Definition

  • A database is a collection of data representing information of interest for a computer system.

Technical Definition

  • A database is data managed by a DBMS (Database Management System).

Types of Databases

  • DBMS Types

    • Relational

    • NoSQL

      • Examples: MongoDB, PostgreSQL, Cassandra, MySQL

    • Object-Oriented

    • Document-based

      • Examples: IBM DB2, Microsoft SQL Server, Elasticsearch

    • Data Warehouse

    • Time Series

      • Examples: ORACLE, In-memory, InfluxDB

    • Graph

      • Example: Neo4j

    • Real-time

      • Example: Firebase, SQLite, Redis

Data Base Management System (DBMS)

Definition

  • A software system for managing large, shared, and persistent data collections ensuring reliability and privacy.

DBMS Characteristics

  • Key Features

    • Supports massive amounts of data (Giga/Tera/Petabytes) - too large for main memory.

    • Provides persistent storage - data persists after programs finish.

    • Allows efficient and convenient access to query data without searching the entire database.

    • Ensures secure, concurrent, and atomic access with user authorization and reliability guarantees against system failures.

Relational DBMS

  • Traditional term for DBMS, referring to systems designed specifically for relational databases.

    • Examples: Oracle, SQL Server, PostgreSQL, MySQL.

  • Utilizes SQL for data description and manipulation with ACID principles ensuring transaction reliability (Atomicity, Consistency, Isolation, Durability).

Applications of DB Technology

  • Domains of Use

    • Storage and retrieval (inventory applications)

    • Multimedia apps (YouTube, Spotify)

    • Biometric technology (fingerprints, retina scans)

    • Wearable technology (FitBit, Apple Watch)

    • GIS applications (Google Maps)

    • Sensor technology (nuclear reactor monitoring)

    • Big Data applications (Amazon)

    • IoT applications

DBMS vs. File System

  • File System

    • Stores data persistently but with simple access mechanisms and limited sharing capabilities.

  • DBMS Advantages

    • Extends file system functionalities by providing integrated services.

Comparison: File System vs DBMS

  • File System

    • Duplicates or redundancies in information storage.

    • Risks inconsistent data.

    • Strong coupling between applications and data.

  • DBMS Approach

    • More efficient, consistent, and maintainable than file systems.

    • Looser coupling, with robust querying and retrieval functionalities.

Data Model

Definition

  • A data model organizes and describes data structure that a computer understands.

  • Elements

    • Elementary data types (integers, characters)

    • Structuring mechanisms (record builders, arrays)

Types of Data Models

  • Relational Model

    • Most widespread; data is organized into homogeneous records represented as tables.

  • Historic Models

    • Hierarchical model, network model.

  • Recent Developments

    • Object model, XML, NoSQL databases.

Examples of Relational Model

  • Entities and Attributes

    • Teacher (Name, Department, Phone)

    • Courses (Code, Name, Teacher)

Schema and Instances

  • Schema

    • Defines the structure; relatively unchanging over time (table header).

  • Instance

    • Composed of actual data values (rows in the table) which are variable over time.

Other Data Models: NoSQL

  • A database forms collections, each with documents containing key-value fields, allowing heterogeneous document structures.

Data Representation

  • Independent from logical models; used during design, such as in entity-relationship models.

  • Describes data structures in DBMS context, relevant to programmers.

Database Design Process

Steps in Design

  • Application Requirements Analysis

    • Create a conceptual schema reflecting the requirements.

  • Conceptual Model Design

    • Translate into a logical schema.

  • Physical Design

    • Finalize with implementation details (file organization, indices).

Levels of Abstraction in DBMS

  • Defined user roles lead to different schema levels including External, Logical, and Internal schemas.

Three Layer Architecture

  • Schema Levels

    • Logical Schema: Database description using a logical model.

    • Internal Schema: Representation of logical schema in physical storage.

    • External Schema: User-specific views reflecting needs.

Data Independence Overview

  • Allows users/applications to ignore design complexities in the database construction.

  • Levels

    • Physical

    • Logical

Physical Independence

  • Users can interact with the DBMS without concern for data's physical structure changes.

Logical Independence

  • Users can interact with external levels without changes impacting logical levels.

  • New or altered views can be added without changing the logical schema.

Data Access

Languages

  • User-friendly interfaces enabling specific queries, using languages that allow for interactions without text-level querying.

  • Types

    • Data Definition Languages (DDL) for schema definition and authorizations.

    • Data Manipulation Languages (DML) for querying and updating instances.

Users & Roles

  • Database Administrator

    • Manages centralized control, performance, reliability, and authorizations.

  • Designers/Programmers

    • Define structure and development of programs accessing the database.

  • End Users

    • Use applications to perform tasks; casual users formulate queries ad hoc.

Transactions

  • Applications performing frequent, defined activities such as bookings and transfers, often using SQL within host languages.

Advantages and Disadvantages of DBMS

Advantages

  • Centralized data as a common resource reduces redundancies and ensures consistency.

  • Unified data model promotes standardization and economies of scale.

  • Achieves data independence.

Disadvantages

  • High costs associated with purchasing, hardware, software, and training.

  • Integration of various services increases overall complexity and may impact performance.

Categorization of DBMS

By Data Model

  • Hierarchical DBMS

    • Tree-like model; procedural DML.

  • Network DBMS

    • Uses a network structure; procedural DML.

  • Relational DBMS (RDBMS)

    • Most popular, SQL-based.

  • Object-Oriented DBMS (OODBMS)

    • Niche market due to complexity.

  • Object-Relational DBMS (ORDBMS)

    • Extended relational model using OO concepts.

  • XML DBMS

    • Native XML or XML-enabled systems.

  • NoSQL DBMS

    • Focused on big and unstructured data.

By Degree of Simultaneous Access

  • Single User vs. Multi User

    • Connection management through dispatcher/server instances.

By Architecture

  • Centralized

    • Data is on a centralized server.

  • Client-Server

    • Active client requesting services from passive servers.

  • n-Tier

    • Client, application server, and database server configurations.

  • Cloud

    • Hosted by third-party providers.

  • Federated

    • Uniform access to multiple data sources.

  • In-Memory

    • Data stored in internal memory for speed.

By Usage

  • OLTP (On-line Transaction Processing)

    • Focus on operational data and high transaction volumes.

  • OLAP (On-line Analytical Processing)

    • Used for strategy-related decisions with complex queries.

  • Big Data & Analytics

    • Flexible, schema-less structures for unstructured data.

  • Multimedia

    • Storage and query capabilities for various multimedia types.

  • Spatial Applications

    • Supports geospatial data and GIS data management.

  • Sensor Data Management

    • Effective handling of data from sensors.