Database Systems and Data Management
Chapter 5: Database Systems and Data Management
General Introduction
Azhar Merchant BCIS 3610
The content is proprietary and must not be scanned, copied, or posted publicly.
Database Definition
A database is a well-designed, organized, and carefully managed collection of data.
Purpose:
To help an organization achieve its goals.
Contributes to organizational success.
Provides managers and decision-makers with timely, accurate, and relevant information built on data.
Multiple databases are common within organizations.
Data, Information, and Knowledge
Data: Raw factual elements.
Information:
Collection of organized and processed data that holds additional value beyond individual facts.
Knowledge:
Awareness and understanding derived from a set of information.
Demonstrates how information can be used to support specific tasks or decision-making.
Types of Data
Alphanumeric Data: Composed of numbers, letters, and other characters.
Audio Data: Sounds, noises, or tones.
Image Data: Graphics and pictures.
Video Data: Moving images or pictures.
The Value of Information
The value of information is directly linked to how it assists decision makers in achieving organizational goals.
Information aids efficient and effective task performance.
Benefits Gained from High-Quality Data
Improves Decision Making:
Reliance on high-quality data reduces guesswork and risk.
Increases Customer Satisfaction:
Good data leads to favorable outcomes and higher customer satisfaction.
Increases Sales:
Accurate consumer targeting and communication contribute to successful sales strategies (e.g., up-sell and cross-sell suggestions).
Improves Innovation:
Enhances worker efficiency, product/service quality, and customer experience.
Raises Productivity:
Employees can concentrate on core missions without the distraction of data errors.
Ensures Compliance:
Maintains adherence to necessary regulations.
Characteristics of High-Quality Data
Easily Accessible:
Data must be available to the right user at the right time and in the correct format.
Error Free and Accurate:
Follow the principle of "Garbage In – Garbage Out" by feeding correct data to avoid errors.
Complete Information:
Incomplete information may lead to poor decision-making.
Economical:
The cost of creating or collecting data must be reasonable.
Relevant:
Information should be applicable to its context (e.g., baseball data shouldn't be used by football companies).
Reliable:
Information must be trustworthy; sources should be identifiable.
Timely:
Data must be current; outdated data (e.g., not useful weather reports) can mislead.
Verifiable:
Data should be double-checked for accuracy and reliability.
The Data Hierarchy
Entity:
Represent an object (person, place, or thing) for which data is collected.
File:
A collection of entities.
Attribute:
A particular characteristic of an entity.
Domain:
The range of allowable values for a data attribute.
Data Item:
A specific value for a data attribute.
Record:
A collection of attributes about a specific entity.
Primary Key:
An attribute or a set of attributes that uniquely identifies a record.
Foreign Key:
An attribute in one table referring to the primary key in another table.
The Database Approach
Database Approach:
Ensures that multiple information systems share a pool of related data.
Database Management System (DBMS):
A group of programs to access and manage databases; provides an interface for users and applications.
Schema:
Describes the logical and physical structure of the database (tables, attributes, relationships).
Allows DBMS to reference and access requested data efficiently.
Database Activities
Creating and Modifying the Database
Data Definition Language (DDL):
Collection of commands defining and describing data and relationships in a database.
Data Dictionary:
A detailed description of the data stored in the database that adheres to standard definitions facilitating data sharing.
Storing and Retrieving Data
DBMS Functions:
Interfaces between application programs and databases to obtain data.
Concurrency Control:
Manages scenarios when multiple users or applications access the same record simultaneously.
Manipulating Data and Generating Reports
Query by Example (QBE):
A visual method for creating database queries.
Data Manipulation Language (DML):
Provides a means for users to modify and access data, perform queries, and produce reports.
Security Management
Data Protection:
Prevents unauthorized access, physical damage, and operational failures.
User Access Privileges:
Limits data access to roles necessary for job functions; only essential privileges granted.
Backup and Recovery:
Involves making backup copies for restoring databases after failures.
Data Cleansing
Definition:
Involves detecting, correcting, or deleting incomplete or inaccurate records in a database to enhance decision-making quality.
Methods:
Cross-checking against validated data sets and using data enhancement techniques to augment existing database information.
Database Design
Purpose of Database Design:
Organize data efficiently and enable quick access and easy modifications.
Reflect an organization's business processes.
Design Considerations:
Content and Access, Logical and Physical Structure, Security, and Response Time.
Data Modelling:
Tools used at both organizational (Enterprise Data Modelling) and specific application levels.
Identifies significant data entities, attributes, definitions, lengths, formats, and valid value domains.
Entity-Relationship (ER) Diagram:
A model used for analyzing and communicating data needs; represents entities, attributes, and relationships with graphical symbols.
Relational Databases
Characteristics of Relational Databases:
Fields (attributes) are organized into two-dimensional tables (relations). Rows represent entities, and columns represent attributes, uniquely identified by primary keys.
Data in columns can be of various types including integers, decimals, and text, constrained to specific types, lengths, or values.
Primary and foreign keys show table relationships. User queries allow operations on the database.
Data Manipulation in Relational Databases
Selecting:
Eliminates rows based on criteria.
Projecting:
Removes certain columns in a table.
Joining:
Merges two or more tables through common attributes to produce a new table.
Data Normalization:
Reduces data redundancy.
SQL Databases
SQL (Structured Query Language):
A programming language for database access and manipulation.
ACID Properties of SQL Databases:
Atomicity: Ensures complete transaction success or rollback if any part fails.
Consistency: Guarantees valid transitions between database states adhering to defined constraints.
Isolation: Ensures concurrent transactions operate invisibly to one another.
Durability: Guarantees permanence of data changes post-commitment, safeguarding against system failures.
Example SQL Commands:
SELECT Query: (Example - Use of
JOIN
)SELECT Client Name, Debt FROM Client WHERE Debt > 1000
Displays clients owing more than $1,000.
SELECT ClientName, Client Num, OrderNum FROM Client, Order WHERE Client.ClientNum = Order.ClientNum
Combines two tables based on the client number.
Popular Relational DBMS
Open-Source Relational DBMS: MySQL, PostgreSQL, MariaDB, etc.
Relational DBMS for Individuals: Microsoft Access, SQLite.
Enterprise Level DBMS: Oracle, IBM DB2, Microsoft SQL Server, etc.
Database as a Service (DaaS)
Definition: Databases stored on a service provider’s servers, accessed over the Internet, with administration managed by the provider.
Advantages: Eliminates the need for in-house database installation, maintenance, and monitoring.
Data Management
Definition: An integrated function defining processes for obtaining, certifying, storing, securing, and processing data, ensuring it meets users' needs for accessibility, reliability, and timeliness.
Data Governance
Definition: Outlines roles, responsibilities, and processes ensuring data can be trusted and utilized across the organization.
Ensures the identification of personnel accountable for fixing and preventing data issues.
Data Stewardship
Data Steward: A governance role responsible for ensuring data quality and fitness, typically filled by a non-IS employee managing critical data entities.
Data Lifecycle Management
Definition: A policies-based approach overseeing data flow from acquisition or creation to eventual deletion when outdated.
Summary
Well-designed and managed databases significantly aid in decision-making.
High-quality data yields multiple benefits, including improved decision making, increased customer satisfaction, increased sales, enhanced innovation, better productivity, and compliance.
A strong data management program, anchored by data governance, ensures the maintenance of high-quality data.