J

Ch1_Introduction_to_SW

Course Information

  • Course Resources

    • Blackboard is the main platform for class syllabi, project descriptions, slides, announcements, etc.

    • Email will also be utilized for announcements.

  • Course Evaluation

    • Midterm exam: 30%

    • Project: 30%

    • Final exam: 40%

Recommended Literature

  • Essential Texts

    • The Data Warehouse Toolkit by Ralph Kimball & Margy Ross

    • Building the Data Warehouse by William H. Inmon

    • Data Warehouse Systeme by Andreas Bauer & Holger Günzel

  • Further Readings

    • The Data Warehouse ETL Toolkit by Ralph Kimball & Joe Caserta

    • OLAP Solutions by Erik Thomsen

    • Data Warehouses and OLAP by Robert Wrembel & Christian Koncilia

Introduction to Data Warehousing

  1. Why a Data Warehouse is Needed

  2. Definition of a Data Warehouse

  3. Core Concepts in Data Warehousing

  4. Applications and Users of Data Warehouses

Importance of Data

  • Data may soon be more important than software (Tim O'Reilly)

  • Data is precious and lasts longer than systems (Tim Berners-Lee)

  • Information is likened to oil in the 21st century (Peter Sondergaard)

  • Data lives indefinitely beneath applications (Tom Kyte)

Major Challenges for Effective Decision Support

  1. Distributed Data

    • Problem: Data exists across various systems and technologies.

    • Solution: Consolidate data into a single system for analysis.

  2. Different Data Structures

    • Problem: Inconsistencies across systems regarding data types and modeling.

    • Solution: Implement a dedicated system to standardize data.

  3. Issues with Historic Data

    • Problem: Data is often deleted or archived after a short period.

    • Solution: Store all changes in a system suited for large data volumes.

  4. System Workload Challenges

    • Problem: Systems are not optimized for analytical workloads, risking performance issues.

    • Solution: Use dedicated systems designed for complex queries on large data sets.

What is a Data Warehouse?

  • A data warehouse is a system where data is stored and analyzed to inform decisions.

  • It is specifically designed to support business intelligence activities.

Characteristics of a Data Warehouse

  1. Large Repository

    • A data warehouse is a large-scale repository of organizational data.

    • It begins around 1 TB and can extend to several PB across multiple servers.

  2. Data Cleansing and Snapshots

    • Contains historical snapshots of operational data, obtained through ETL processes.

  3. Analytical Focus

    • Tailored for tactical and strategic reporting, capturing changes over time.

Definitions by Experts

  • Ralph Kimball:

    • Describes a data warehouse as a structured copy of transactional data for queries and analysis.

  • Bill Inmon:

    • Defines it as a subject-oriented, integrated, non-volatile, time-variant collection of data for decision support.

Functional Characteristics

  • Subject Oriented:

    • Data is organized around real-world events or objects (e.g., Customer, Product).

  • Integrated:

    • Data sourced from various operational systems, harmonized for consistency.

  • Non-Volatile:

    • Data is static and retained for future reporting; once committed, data isn't deleted or changed directly.

  • Time-Variant:

    • Tracks changes, enabling historical reporting and analysis over time.

Utilization of Data Warehouses

  • OLTP (Online Transaction Processing):

    • Used for operational data entry and transaction processing, reflecting current states.

  • OLAP (Online Analytical Processing):

    • Used for reporting and decision-making analytics based on DW data.

Typical Applications of Data Warehousing

  1. Investment and Insurance

    • Market analysis and customer behavior insights.

  2. Retail

    • Monitoring products, pricing strategies, and purchasing patterns.

  3. Healthcare

    • Generating treatment reports and collaborating with insurance and research departments.

  4. Education

    • Real-time analytics for informed decision-making regarding faculty and students.

  5. Banking

    • Monitoring resources and assessing product performance.

  6. Manufacturing and Distribution

    • Consolidating data for forecasting and trend analysis.

  7. Government

    • Analyzing tax information and health records.

  8. Telecommunications

    • Informing distribution and sales strategies.