Lecture 01

Data Warehousing Overview

  • Lecturer: Dr. Syed Aun Irtaza

  • Lecture Number: 01

Course Structure

  • Topics Covered:

    • Data Warehousing

    • Big Data Analytics

Data is Everywhere

  • Rapid Growth of Data:

    • Total internet data: 120 zettabytes (ZB) in 2023, projected to reach 181 ZB by 2025.

    • Daily data creation: Approximately 2.5 quintillion bytes including social media, emails, and transactions.

  • Sources of Internet Data:

    • Social Media: Platforms such as Facebook and Twitter generate petabytes of data daily.

    • Videos: YouTube faces over 500 hours of uploads per minute.

    • Emails: Over 300 billion emails sent daily.

    • Search Engines: Google processes more than 8.5 billion searches a day.

  • Cloud Storage and Data Centers:

    • Major providers: AWS, Google Cloud, Microsoft Azure.

    • Largest data centers can host over 100 petabytes of information.

  • Data Growth Trends:

    • Influenced by IoT devices, 5G networks, and AI advancements.

Understanding Big Data

  • Definition:

    • Massive volumes of data from sensors; e.g., LSST (Large Synoptic Survey Telescope) generates 40TB/day.

    • Expectation of over 100PB in a decade.

Future Directions

  • Knowledge-Driven Economy:

    • Importance of harnessing data effectively.

    • Consideration of first mover advantages in the tech industry.

  • Industry Change:

    • Business must adapt or risk being left behind.

    • Missed opportunities can hinder growth.

Airline Business Case Study

  • Business Overview:

    • Airlines fundamentally sell seats but face complexities in operations and profitability.

  • Revenue Generation:

    • Major income from passenger services and ticket sales.

    • Pricing strategies include dynamic pricing based on the demand.

    • Ancillary revenues such as baggage fees and loyalty programs.

    • Cargo transport as an additional revenue source.

  • Expenses Breakdown:

    • Fixed Costs: Aircraft leasing, crew salaries, and maintenance.

    • Variable Costs: Fuel, airport fees, and services.

    • Uncontrollable Costs: Impacted by fuel prices, regulations, and exchange rates.

  • Challenges in the Industry:

    • High operational costs and thin profit margins.

    • Market volatility and intense competition.

    • Regulatory and environmental challenges.

Operational and Strategic Challenges

  • Disruptions:

    • Weather, cybersecurity issues, and pandemics.

    • Changing customer expectations and workforce management.

  • Strategies for Improvement:

    • Focus on cost optimization and revenue diversification.

Profitability Analysis

  • Banking Sector Insights:

    • Many customers often turn out unprofitable despite overall profit.

    • Importance of transactional behavior analysis.

    • Need for product restructuring for effective profitability analysis over time.

Data and Fraud Detection

  • Hacked Credit Card Patterns:

    • Deviations from typical purchasing habits signal potential fraud.

    • Cities notorious for fraud and unusual items associated with stolen cards.

Course Aims

  • Develop understanding of RDBMS concepts and their application in decision support systems.

  • Analyze differences between RDBMS and Data Warehouse.

  • Learn big data analysis and emerging technologies in the field.

Course Summary

  • Topics include:

    • Introduction to Data Warehousing

    • RDBMS Basics and SQL

    • Python and ETL processes

    • Data Mining and Machine Learning topics

    • Information Visualization Techniques

Inspirational Quote

  • "I cannot teach anything to anyone, I can just make them think." - Socrates

robot