Fundamentals of Business Intelligence (FBI) Lecture 2: Data Warehousing Notes
Botswana Accountancy College - Fundamentals of Business Intelligence (FBI) Lecture 2: Data Warehousing
Reference Information
Institution: Botswana Accountancy College
School: School of Computing & Information Systems
Degree: Business Intelligence & Data Analytics
Module: Fundamentals of Business Intelligence (FBI)
Lecturer: VM
Semester: 2, January 2017
Textbook: Business Intelligence: A Managerial Approach, Pearson Prentice Hall, 2014, 3rd Edition, ISBN-13: 978-0-13-234761-7
Copyright: © 2017, BAC
Learning Objectives
Explain the role of data warehouses in decision support.
Explain data integration and the Extraction, Transformation, and Load (ETL) processes.
Describe real-time (also known as right-time or active) data warehousing.
Understand data warehouse administration and security issues.
Opening Vignette: Isle of Capri Casinos
Company Background: Overview of Isle of Capri Casinos and its operations.
Problem Description: Identification of challenges faced due to lack of data infrastructure.
Proposed Solution: Introduction of an Enterprise Data Warehouse (EDW).
Results: Outcomes following the implementation of the EDW.
Case Questions: Discussion prompts related to the EDW implementation:
Importance of having an EDW at Isle.
Identification of business challenges or opportunities faced.
Process undertaken by Isle to realize the EDW, including potential challenges.
Benefits gained from the EDW implementation, along with additional potential benefits.
Discussion on the capacity of large enterprises in the gaming industry to succeed without robust data infrastructure.
Main Data Warehousing Topics
Definitions and characteristics of data warehousing.
Data Marts: Smaller segments of data warehouses.
Operational Data Stores (ODS), Enterprise Data Warehouse (EDW), and Metadata.
Data Warehousing Framework and Architecture.
ETL Process: Extraction, Transformation, Load operations.
Data Warehousing Development and Issues.
What is a Data Warehouse?
Definition: A physical repository where relational data is organized for enterprise-wide decision support, providing cleansed data in a standardized format.
Important Characteristics:
Integrated, subject-oriented databases tailored for decision support system (DSS) functions.
Features: Non-volatile and relevant to a specific moment in time.
Historical Perspective to Data Warehousing
Overview of the evolution of data warehousing from the 1970s to the 2010s:
1970s: Mainframe computers, simple data entry, routine reporting, primitive databases.
1980s: Introduction of mini and personal computers (PCs), business applications and distributer DBMS, adoption of relational databases, advancements in centralized data storage.
1990s: Inception of business data warehousing concepts.
2000s: Growth of data warehousing as a business intelligence tool, emergence of data warehouse appliances, increased focus on data mining and predictive modeling.
2010s: Rise of Big Data analytics, social media, cloud computing, and NoSQL databases.
Characteristics of Data Warehouses
Subject-oriented: Focused on specific subject areas of the business.
Integrated: Combines data from multiple sources.
Time-variant: Historical data is stored to reflect changes over time.
Nonvolatile: Data is stable and doesn’t change over time.
Summarized: Provides aggregated data that is easy to analyze.
Not normalized: Data structure is not strictly normalized to optimize retrieval.
Metadata: Information about the data stored, describing its content and management.
Web-based, relational/multi-dimensional structures: Supports varied data access methods.
Real-time or right-time/active data warehousing capabilities.
Data Mart Concept
Definition: A smaller, departmental data warehouse that stores relevant data for specific business functions.
Dependent Data Mart: A subset directly created from a data warehouse.
Independent Data Mart: A standalone data warehouse designed for a specific business unit or department.
Other Components of Data Warehousing
Operational Data Stores (ODS): Temporary storage for integration processes before data enters a data warehouse.
Enterprise Data Warehouse (EDW): Comprehensive data warehouse serving the needs of the entire enterprise.
Metadata: Describes contents and structure of data within the warehouse.
Application Case 2.1: TELCOs and Data Warehousing
Questions for Discussion:
Identification of main challenges faced by Telecommunications Companies (TELCOs).
Exploration of how data warehousing and analytics can help address these challenges.
Analysis of why TELCOs are particularly well-positioned to leverage data analytics.
Generic Data Warehouse Framework
Components of the Framework:
Data Sources including ERP, legacy systems, POS, and other databases.
ETL Process: Extract, Transform, Load operations detailed.
Enterprise Data Warehouse and Data Marts: Visualization of workflow from source to storage.
Applications: Business reporting and visualization tools as outputs from data marts.
Application Case 2.2: MultiCare Health System
Questions for Discussion:
Role of data warehousing in healthcare systems.
How MultiCare utilized data warehousing for improved health outcomes.
Data Warehouse Architecture
Three-Tier Architecture:
Data acquisition software (Back-end)
Data warehouse
Client software (Front-end)
Two-Tier Architecture: Combines data warehouse and application server into one tier.
Visual representations of data storage and processing architectures such as application servers, database servers, etc.
Issues in Choosing DW Architecture
Considerations in selecting architecture:
Database management system (DBMS) compatibility.
Use of parallel processing or data partitioning.
Data migration tool requirements.
Facilities for data retrieval and analysis.
Web-Based Data Warehouse Architecture
Elements involved:
Web Server, Client Access (Web browser), Application Server connected to the Data Warehouse.
Utilization of Internet, Intranet or Extranet resources.
Alternative Data Warehouse Architectures
Independent Data Marts Architecture: Characterized by separate data marts without direct integration.
Hub and Spoke Architecture: Centralizes data warehouse with links to departmental data marts.
Bus Architecture: Integrates data marts through common dimensions.
Each architecture presents its distinct advantages and disadvantages.
Factors Affecting Architecture Selection
Information interdependence across organizational units.
Upper management’s information needs.
Urgency for data warehousing.
Nature of end-user tasks.
Constraints on resources.
Strategic vision prior to implementation.
Compatibility with existing systems.
IT staff capabilities.
Technical challenges.
Social and political factors.
Teradata Corporation Data Warehouse Architecture Overview
Outlines the structure for processing transactional data related to operational data stores and enterprise data warehouses leading to data marts for decision users (executives, analysts).
ETL Process Details
ETL Definition: Extraction, Transformation, Load framework for data integration.
Three Processes: Data access, federation, and change capture.
Enterprise Application Integration (EAI): Technology facilitating movement of data to the warehouse.
Enterprise Information Integration (EII): Tools for real-time data integration from various sources.
ETL Process Overview
Flow Diagram Representation:
Showing data sources, transformation processes, and pathways to data storage (warehouse/mart).
ETL Tool Selection Considerations
High cost of data transformation tools and associated skill requirements.
Criteria for selecting a tool include:
Ability to manage various data sources.
Metadata capture features.
Compliance with open standards.
User-friendly interfaces.
Application Case 2.5: Starwood Hotels & Resorts
Discussion Questions:
Scale and complexity of Starwood’s operations.
Utilization of data warehousing for profitability.
Challenges faced, proposed solutions, and results obtained.
Additional Data Warehouse Considerations
Hosted Data Warehouses: Benefits include minimal infrastructure investment, improved cash flow, and access to advanced technology.
Representation of Data in Data Warehousing
Dimensional Modeling: Used for retrieval-based systems.
Star Schema: Simplest model with a fact table connected to dimension tables.
Snowflake Schema: An extension of the star schema, structured like a snowflake.
Multidimensionality in Data Presentation
Definition: Capacity to analyze data across various dimensions such as region, product, salesperson, and time.
Presentation Dimensions: Products, sales channels, geographical data, market segments, etc.
Measures: Money, sales volume, headcount, etc., against time frames.
Comparative Analysis of Star versus Snowflake Schema
Visual depiction contrasting the two schemas with respect to fact and dimensions.
Analysis of Data in Data Warehousing
OLTP vs. OLAP analysis:
OLTP (Online Transaction Processing): Focused on efficiency of data handling in transaction databases.
OLAP (Online Analytical Processing): Aimed at providing decision-support information.
OLAP Operations and Techniques
Definitions of operations including:
Slice: Subset selection from a multi-dimensional array.
Dice: Slicing across multiple dimensions.
Drill Down/Up: Navigational operations for detail summarization.
Roll Up: Aggregation of data relationships.
Pivot: Changing data perspective within reports or queries.
OLAP Variations and Implementations
Different methods include:
MOLAP: Special data stores for pre-summarized multidimensional views.
ROLAP: OLAP operation performed on relational DBs.
DOLAP/WOLAP: Desktop and web implementations of OLAP.
Technology Insights in Data Warehousing
Introduction to resources available through platforms like Teradata University Network for teaching and learning technologies related to data warehousing.
DW Implementation Issues and Challenges
Identification of data sources and resolution of governance issues.
Key considerations include data quality, tool selection, end-user support, and political dynamics.
Guidelines for Successful DW Implementation
Barriers to avoid include the wrong sponsorship, unrealistic expectations, loading irrelevant data, and selecting technology-oriented managers over user-oriented ones.
Common Failure Factors in DW Projects
Issues include:
Lack of executive support, unclear objectives, cultural resistance, poor change management, and inadequate architecture leading to low data quality.
Scalability in Data Warehousing
Scalability concerns include:
Data volume growth, user concurrency, and complexity of queries expected over time.
Real-Time/Active Data Warehousing and BI
Trends towards enabling real-time decision-making.
Exploration of push vs pull data mechanisms and management of consistency and cost feasibility challenges.
Comparative Analysis: Traditional vs. Active DW
Traditional DW: Mainly suitable for strategic decisions at a slower dataset refresh rate.
Active DW: Designed for both strategic and tactical decisions with an emphasis on live data access.
Data Warehouse Administration and Security
Role of Data Warehouse Administrator (DWA): Key responsibilities including management of technology, understanding of decision-making processes, and strong communication skills.
Security and Privacy: Emphasis on safeguarding data and compliance with regulations (e.g., HIPAA).
Future Trends in Data Warehousing
Emerging trends include:
Reliance on web and social media data, adoption of open-source software, cloud computing integration, and advances in analytics technology.
Free Resources for Learning Data Warehousing
Portal for Teaching & Learning: Available at www.TeradataUniversityNetwork.com with password access through instructors.