J

DW_Lecture1.

Course Information

Course Resources

  • Blackboard is the primary platform for accessing essential course materials, including:

    • Class syllabus: outlines course schedule and policies.

    • Project description: provides guidelines and expectations for the project to be undertaken during the course.

    • Course slides: slides used in lectures for better understanding of the topics covered.

    • Turn project reports: facilitates the submission of project-related documents.

    • Announcements: course updates and important information will be communicated here.

  • Email will also serve as a communication channel for important announcements not covered on Blackboard.

Course Evaluation

The evaluation of students' performance will consist of the following components:

  • Midterm exam (30%): assesses understanding of the first half of the course material.

  • Project (30%): practical application of learned concepts to a real-world scenario.

  • Final exam (40%): a comprehensive evaluation of all course content covered.

Recommended Literature

Books

  • The Data Warehouse Toolkit by Ralph Kimball & Margy Ross

  • Building the Data Warehouse by William H. Inmon

  • Data Warehouse-Systeme by Andreas Bauer & Holger Günzel

  • The Data Warehouse ETL Toolkit by Ralph Kimball & Joe Caserta

  • OLAP Solutions by Erik Thomsen

  • Data Warehouses and OLAP by Robert Wrembel & Christian Koncilia

Introduction to Data Warehousing

Key Questions

  • Why do we need a data warehouse (DW)?

    • To consolidate data for effective analysis and reporting, eliminating the need for disparate systems.

  • What is a data warehouse?

    • A specialized database optimized for query and analysis.

  • Key concepts in data warehousing:

    • Understanding the difference between operational databases and analytical databases.

  • Applications and users of DW:

    • Ranging from corporate sectors to educational institutions, serving various functional needs.

Source Reference

  • Insights into internet data usage, specifically examining the implications of data consumption in an internet minute, highlighting the advancing digital footprint.

Perspectives on Data

Quotations

  • "We’re entering a new world in which data may be more important than software." – Tim O’Reilly

  • "Data is a precious thing and will last longer than the systems themselves." – Tim Berners-Lee

  • "Information is the oil of the 21st century." – Peter Sondergaard, Gartner

  • "Applications come, applications go. The data, however, lives forever... It really is about the data underneath these applications." – Tom Kyte, Oracle

Major Problems for Effective Decision Support

Issues Identified

  • Distributed Data:

    • Data is scattered across various systems, leading to inconsistency.

    • Solution: Accumulate data on a unified system for comprehensive analysis.

  • Different Data Structures:

    • Independent development of systems results in inconsistent data formats (e.g., zip codes, units).

    • Solution: A harmonization framework that standardizes data.

  • Historic Data:

    • Quick deletion or archiving of historical data restricts in-depth analysis.

    • Solution: Establish a centralized data storage solution supporting long-term retention for historical queries.

  • System Workload:

    • Increased workload can lead to system performance issues causing potential failures.

    • Solution: Implement dedicated systems optimized for complex analytical queries.

Why Data Warehousing?

  • Operative systems alone do not suffice for analytical purposes.

  • A separated system is crucial to prevent interference with daily business transactions and to maintain efficiency in operations.

Definition of a Data Warehouse (DW)

General Description

  • A data warehouse serves as a large repository designed for storing, organizing, and analyzing data to enhance decision-making processes.

Specifics of DW

  • Collective data repository: Incorporates snapshots of transactional data for evaluation.

  • Designed for reporting and long-term analysis: Typically starts at a size of 1TB, expanding to several PB to accommodate large volumes of data over time.

Key Characteristics

  • Subject-oriented:

    • Focus on specific subjects or events (e.g., customers, products) rather than on individual transactions.

  • Integrated:

    • Merges data from various operational systems, ensuring consistency in datasets.

  • Non-volatile:

    • Data remains static and read-only after being committed, with subsequent changes leading to the creation of new records.

  • Time-variant:

    • Records changes over time supporting solid historical reporting.

Key Definitions by Experts

  • Ralph Kimball: "A copy of transaction data specifically structured for query and analysis."

  • Bill Inmon: "A subject-oriented, integrated, non-volatile, time-variant collection of data supporting decision-making."

Characteristics of a Data Warehouse

  • Subject-Oriented: Data is organized around real-world subjects, facilitating easier access to relevant information.

  • Integrated: Brings together data from diverse source systems ensuring definitive consistency across datasets.

  • Non-Volatile: Maintains static data properties, focusing on creating new records in the event of updates rather than altering existing entries.

  • Time-Varying: Ensures the retention of historical data to enable a consistent overview across different time frames.

Typical Features of a Data Warehouse

  • Operates on scalable DBMS (Database Management Systems) like Oracle, IBM DB2, which can handle increased data loads over time.

  • Retains data over extensive periods sourced from various operational systems, ready for analysis as needed.

Data Processing Types

  • OLTP (Online Transaction Processing):

    • Designed for daily operations, offering real-time data reflecting current statuses.

  • OLAP (Online Analytical Processing):

    • Tailored for reporting and analytical queries generated from data warehouse repositories, enabling deep data analysis.

Applications of Data Warehousing

  • Investment and Insurance:

    • Analyzing market factors and customer data for improved financial decision-making.

  • Retail:

    • Tracking products efficiently and strategizing on pricing for maximizing sales.

  • Healthcare:

    • Managing treatment reports and sharing vital data securely across platforms.

  • Education:

    • Real-time access to data for informed operational and academic decision-making.

  • Banking:

    • Evaluating market conditions and tracking financial performance metrics.

  • Manufacturing:

    • Conducting trend analysis and accurate forecasting to optimize production.

  • Government:

    • Monitoring key records like taxation and healthcare for better public service management.

  • Telecommunications:

    • Making informed decisions regarding distribution and marketing strategies.