Lesson1-Intro
INTRODUCTION TO DATABASES LESSON 1
Introduction to Databases
Key Points of Discussion
Information management
Databases
Data model
Design of a database
Data independence
Data access
Advantages and disadvantages of a DBMS
Information Management
Overview
Information is recorded and exchanged in various forms.
The need for organizing and codifying information has led to different methods over time.
Computer Systems
Information Representation
Information is represented by data in computer systems.
Data are raw symbols needing interpretation for meaning.
Example
Data: "Laura Smith" and 424242
Information: Result of looking up a telephone number in a personal directory.
Data Characteristics
Data remain stable over time, unlike the processes managing them.
Example
Minimal variations in the data structures of bank applications over decades, while management procedures vary annually.
Data is a crucial resource for organizations managing them.
Databases
General Definition
A database is a collection of data representing information of interest for a computer system.
Technical Definition
A database is data managed by a DBMS (Database Management System).
Types of Databases
DBMS Types
Relational
NoSQL
Examples: MongoDB, PostgreSQL, Cassandra, MySQL
Object-Oriented
Document-based
Examples: IBM DB2, Microsoft SQL Server, Elasticsearch
Data Warehouse
Time Series
Examples: ORACLE, In-memory, InfluxDB
Graph
Example: Neo4j
Real-time
Example: Firebase, SQLite, Redis
Data Base Management System (DBMS)
Definition
A software system for managing large, shared, and persistent data collections ensuring reliability and privacy.
DBMS Characteristics
Key Features
Supports massive amounts of data (Giga/Tera/Petabytes) - too large for main memory.
Provides persistent storage - data persists after programs finish.
Allows efficient and convenient access to query data without searching the entire database.
Ensures secure, concurrent, and atomic access with user authorization and reliability guarantees against system failures.
Relational DBMS
Traditional term for DBMS, referring to systems designed specifically for relational databases.
Examples: Oracle, SQL Server, PostgreSQL, MySQL.
Utilizes SQL for data description and manipulation with ACID principles ensuring transaction reliability (Atomicity, Consistency, Isolation, Durability).
Applications of DB Technology
Domains of Use
Storage and retrieval (inventory applications)
Multimedia apps (YouTube, Spotify)
Biometric technology (fingerprints, retina scans)
Wearable technology (FitBit, Apple Watch)
GIS applications (Google Maps)
Sensor technology (nuclear reactor monitoring)
Big Data applications (Amazon)
IoT applications
DBMS vs. File System
File System
Stores data persistently but with simple access mechanisms and limited sharing capabilities.
DBMS Advantages
Extends file system functionalities by providing integrated services.
Comparison: File System vs DBMS
File System
Duplicates or redundancies in information storage.
Risks inconsistent data.
Strong coupling between applications and data.
DBMS Approach
More efficient, consistent, and maintainable than file systems.
Looser coupling, with robust querying and retrieval functionalities.
Data Model
Definition
A data model organizes and describes data structure that a computer understands.
Elements
Elementary data types (integers, characters)
Structuring mechanisms (record builders, arrays)
Types of Data Models
Relational Model
Most widespread; data is organized into homogeneous records represented as tables.
Historic Models
Hierarchical model, network model.
Recent Developments
Object model, XML, NoSQL databases.
Examples of Relational Model
Entities and Attributes
Teacher (Name, Department, Phone)
Courses (Code, Name, Teacher)
Schema and Instances
Schema
Defines the structure; relatively unchanging over time (table header).
Instance
Composed of actual data values (rows in the table) which are variable over time.
Other Data Models: NoSQL
A database forms collections, each with documents containing key-value fields, allowing heterogeneous document structures.
Data Representation
Independent from logical models; used during design, such as in entity-relationship models.
Describes data structures in DBMS context, relevant to programmers.
Database Design Process
Steps in Design
Application Requirements Analysis
Create a conceptual schema reflecting the requirements.
Conceptual Model Design
Translate into a logical schema.
Physical Design
Finalize with implementation details (file organization, indices).
Levels of Abstraction in DBMS
Defined user roles lead to different schema levels including External, Logical, and Internal schemas.
Three Layer Architecture
Schema Levels
Logical Schema: Database description using a logical model.
Internal Schema: Representation of logical schema in physical storage.
External Schema: User-specific views reflecting needs.
Data Independence Overview
Allows users/applications to ignore design complexities in the database construction.
Levels
Physical
Logical
Physical Independence
Users can interact with the DBMS without concern for data's physical structure changes.
Logical Independence
Users can interact with external levels without changes impacting logical levels.
New or altered views can be added without changing the logical schema.
Data Access
Languages
User-friendly interfaces enabling specific queries, using languages that allow for interactions without text-level querying.
Types
Data Definition Languages (DDL) for schema definition and authorizations.
Data Manipulation Languages (DML) for querying and updating instances.
Users & Roles
Database Administrator
Manages centralized control, performance, reliability, and authorizations.
Designers/Programmers
Define structure and development of programs accessing the database.
End Users
Use applications to perform tasks; casual users formulate queries ad hoc.
Transactions
Applications performing frequent, defined activities such as bookings and transfers, often using SQL within host languages.
Advantages and Disadvantages of DBMS
Advantages
Centralized data as a common resource reduces redundancies and ensures consistency.
Unified data model promotes standardization and economies of scale.
Achieves data independence.
Disadvantages
High costs associated with purchasing, hardware, software, and training.
Integration of various services increases overall complexity and may impact performance.
Categorization of DBMS
By Data Model
Hierarchical DBMS
Tree-like model; procedural DML.
Network DBMS
Uses a network structure; procedural DML.
Relational DBMS (RDBMS)
Most popular, SQL-based.
Object-Oriented DBMS (OODBMS)
Niche market due to complexity.
Object-Relational DBMS (ORDBMS)
Extended relational model using OO concepts.
XML DBMS
Native XML or XML-enabled systems.
NoSQL DBMS
Focused on big and unstructured data.
By Degree of Simultaneous Access
Single User vs. Multi User
Connection management through dispatcher/server instances.
By Architecture
Centralized
Data is on a centralized server.
Client-Server
Active client requesting services from passive servers.
n-Tier
Client, application server, and database server configurations.
Cloud
Hosted by third-party providers.
Federated
Uniform access to multiple data sources.
In-Memory
Data stored in internal memory for speed.
By Usage
OLTP (On-line Transaction Processing)
Focus on operational data and high transaction volumes.
OLAP (On-line Analytical Processing)
Used for strategy-related decisions with complex queries.
Big Data & Analytics
Flexible, schema-less structures for unstructured data.
Multimedia
Storage and query capabilities for various multimedia types.
Spatial Applications
Supports geospatial data and GIS data management.
Sensor Data Management
Effective handling of data from sensors.