Database Systems and Data Management

Chapter 5: Database Systems and Data Management

General Introduction

  • Azhar Merchant BCIS 3610

  • The content is proprietary and must not be scanned, copied, or posted publicly.

Database Definition

  • A database is a well-designed, organized, and carefully managed collection of data.

  • Purpose:

    • To help an organization achieve its goals.

    • Contributes to organizational success.

    • Provides managers and decision-makers with timely, accurate, and relevant information built on data.

  • Multiple databases are common within organizations.

Data, Information, and Knowledge

  • Data: Raw factual elements.

  • Information:

    • Collection of organized and processed data that holds additional value beyond individual facts.

  • Knowledge:

    • Awareness and understanding derived from a set of information.

    • Demonstrates how information can be used to support specific tasks or decision-making.

Types of Data

  • Alphanumeric Data: Composed of numbers, letters, and other characters.

  • Audio Data: Sounds, noises, or tones.

  • Image Data: Graphics and pictures.

  • Video Data: Moving images or pictures.

The Value of Information

  • The value of information is directly linked to how it assists decision makers in achieving organizational goals.

  • Information aids efficient and effective task performance.

Benefits Gained from High-Quality Data

  • Improves Decision Making:

    • Reliance on high-quality data reduces guesswork and risk.

  • Increases Customer Satisfaction:

    • Good data leads to favorable outcomes and higher customer satisfaction.

  • Increases Sales:

    • Accurate consumer targeting and communication contribute to successful sales strategies (e.g., up-sell and cross-sell suggestions).

  • Improves Innovation:

    • Enhances worker efficiency, product/service quality, and customer experience.

  • Raises Productivity:

    • Employees can concentrate on core missions without the distraction of data errors.

  • Ensures Compliance:

    • Maintains adherence to necessary regulations.

Characteristics of High-Quality Data

  1. Easily Accessible:

    • Data must be available to the right user at the right time and in the correct format.

  2. Error Free and Accurate:

    • Follow the principle of "Garbage In – Garbage Out" by feeding correct data to avoid errors.

  3. Complete Information:

    • Incomplete information may lead to poor decision-making.

  4. Economical:

    • The cost of creating or collecting data must be reasonable.

  5. Relevant:

    • Information should be applicable to its context (e.g., baseball data shouldn't be used by football companies).

  6. Reliable:

    • Information must be trustworthy; sources should be identifiable.

  7. Timely:

    • Data must be current; outdated data (e.g., not useful weather reports) can mislead.

  8. Verifiable:

    • Data should be double-checked for accuracy and reliability.

The Data Hierarchy

  • Entity:

    • Represent an object (person, place, or thing) for which data is collected.

  • File:

    • A collection of entities.

  • Attribute:

    • A particular characteristic of an entity.

  • Domain:

    • The range of allowable values for a data attribute.

  • Data Item:

    • A specific value for a data attribute.

  • Record:

    • A collection of attributes about a specific entity.

  • Primary Key:

    • An attribute or a set of attributes that uniquely identifies a record.

  • Foreign Key:

    • An attribute in one table referring to the primary key in another table.

The Database Approach

  • Database Approach:

    • Ensures that multiple information systems share a pool of related data.

  • Database Management System (DBMS):

    • A group of programs to access and manage databases; provides an interface for users and applications.

  • Schema:

    • Describes the logical and physical structure of the database (tables, attributes, relationships).

    • Allows DBMS to reference and access requested data efficiently.

Database Activities

Creating and Modifying the Database
  • Data Definition Language (DDL):

    • Collection of commands defining and describing data and relationships in a database.

  • Data Dictionary:

    • A detailed description of the data stored in the database that adheres to standard definitions facilitating data sharing.

Storing and Retrieving Data
  • DBMS Functions:

    • Interfaces between application programs and databases to obtain data.

  • Concurrency Control:

    • Manages scenarios when multiple users or applications access the same record simultaneously.

Manipulating Data and Generating Reports
  • Query by Example (QBE):

    • A visual method for creating database queries.

  • Data Manipulation Language (DML):

    • Provides a means for users to modify and access data, perform queries, and produce reports.

Security Management
  • Data Protection:

    • Prevents unauthorized access, physical damage, and operational failures.

  • User Access Privileges:

    • Limits data access to roles necessary for job functions; only essential privileges granted.

  • Backup and Recovery:

    • Involves making backup copies for restoring databases after failures.

Data Cleansing
  • Definition:

    • Involves detecting, correcting, or deleting incomplete or inaccurate records in a database to enhance decision-making quality.

  • Methods:

    • Cross-checking against validated data sets and using data enhancement techniques to augment existing database information.

Database Design

  • Purpose of Database Design:

    • Organize data efficiently and enable quick access and easy modifications.

    • Reflect an organization's business processes.

  • Design Considerations:

    • Content and Access, Logical and Physical Structure, Security, and Response Time.

  • Data Modelling:

    • Tools used at both organizational (Enterprise Data Modelling) and specific application levels.

    • Identifies significant data entities, attributes, definitions, lengths, formats, and valid value domains.

  • Entity-Relationship (ER) Diagram:

    • A model used for analyzing and communicating data needs; represents entities, attributes, and relationships with graphical symbols.

Relational Databases
  • Characteristics of Relational Databases:

    • Fields (attributes) are organized into two-dimensional tables (relations). Rows represent entities, and columns represent attributes, uniquely identified by primary keys.

    • Data in columns can be of various types including integers, decimals, and text, constrained to specific types, lengths, or values.

    • Primary and foreign keys show table relationships. User queries allow operations on the database.

Data Manipulation in Relational Databases
  • Selecting:

    • Eliminates rows based on criteria.

  • Projecting:

    • Removes certain columns in a table.

  • Joining:

    • Merges two or more tables through common attributes to produce a new table.

  • Data Normalization:

    • Reduces data redundancy.

SQL Databases

  • SQL (Structured Query Language):

    • A programming language for database access and manipulation.

  • ACID Properties of SQL Databases:

    • Atomicity: Ensures complete transaction success or rollback if any part fails.

    • Consistency: Guarantees valid transitions between database states adhering to defined constraints.

    • Isolation: Ensures concurrent transactions operate invisibly to one another.

    • Durability: Guarantees permanence of data changes post-commitment, safeguarding against system failures.

  • Example SQL Commands:

    • SELECT Query: (Example - Use of JOIN)

    • SELECT Client Name, Debt FROM Client WHERE Debt > 1000

    • Displays clients owing more than $1,000.

    • SELECT ClientName, Client Num, OrderNum FROM Client, Order WHERE Client.ClientNum = Order.ClientNum

    • Combines two tables based on the client number.

Popular Relational DBMS

  • Open-Source Relational DBMS: MySQL, PostgreSQL, MariaDB, etc.

  • Relational DBMS for Individuals: Microsoft Access, SQLite.

  • Enterprise Level DBMS: Oracle, IBM DB2, Microsoft SQL Server, etc.

Database as a Service (DaaS)

  • Definition: Databases stored on a service provider’s servers, accessed over the Internet, with administration managed by the provider.

  • Advantages: Eliminates the need for in-house database installation, maintenance, and monitoring.

Data Management

  • Definition: An integrated function defining processes for obtaining, certifying, storing, securing, and processing data, ensuring it meets users' needs for accessibility, reliability, and timeliness.

Data Governance

  • Definition: Outlines roles, responsibilities, and processes ensuring data can be trusted and utilized across the organization.

  • Ensures the identification of personnel accountable for fixing and preventing data issues.

Data Stewardship

  • Data Steward: A governance role responsible for ensuring data quality and fitness, typically filled by a non-IS employee managing critical data entities.

Data Lifecycle Management

  • Definition: A policies-based approach overseeing data flow from acquisition or creation to eventual deletion when outdated.

Summary

  • Well-designed and managed databases significantly aid in decision-making.

  • High-quality data yields multiple benefits, including improved decision making, increased customer satisfaction, increased sales, enhanced innovation, better productivity, and compliance.

  • A strong data management program, anchored by data governance, ensures the maintenance of high-quality data.