Business Intelligence and Big Data

Learning Objectives

  • Understand the concept of "big data" and its origins.
  • Define big data by its characteristics – the 4Vs.
  • Explain challenges posed by Big Data.
  • Understand how data integration addresses the issue of being data-rich yet information-poor.
  • Differentiate between data warehouses, data marts, and data lakes.

Definition of Big Data

  • Big Data is defined as a collection of data sets that are so large and complex, they are challenging to process using traditional database management tools or applications.
  • Key Challenges:
    • Capture
    • Storage
    • Search
    • Sharing
    • Transfer
    • Analysis
    • Visualization
  • Daily creation of data is astronomical: 3.3 quintillion bytes daily, with projections of 181 zettabytes by 2025.

Perspectives on Big Data

  • Definition varies depending on the capabilities of the organization and their tools.
  • Example: For some, hundreds of gigabytes may require new management options, while others may not consider data too large until it reaches hundreds of terabytes.

Sources of Big Data

  • Archives: Historical records of communications and transactions.
  • Documents: Emails, presentations, spreadsheets, etc.
  • Business Apps: Data from ERP, CRM, and HR systems.
  • Public Data: Government websites providing local, state, and federal data.
  • Social Media: Data from platforms like Twitter, Facebook, and LinkedIn.
  • Machine Logs: Call detail records and logs from business processes.
  • Media: Images, audio, and video content.
  • Sensor Data: From IoT devices and process control devices.

Big Data Characteristics - The 4Vs

  1. Volume: Refers to the amount of data – can be measured in terabytes, petabytes, and exabytes.
  2. Velocity: The speed at which data is generated and stored, overwhelming traditional systems.
  3. Variety: Refers to different forms of data - roughly 80% of big data is unstructured.
  4. Veracity: The quality and trustworthiness of data, determining reliability for insights.

Challenges of Big Data

  • Determining which data subsets to store.
  • Deciding where and how to store data.
  • Identifying relevant data for decision-making.
  • Extracting value from significant datasets.
  • Protecting sensitive data from unauthorized access.

Data Integration

  • Key Problem: Organizations may have abundant data but lack the processes to turn it into meaningful information.
  • Solution: Data Integration improves business decision quality, affecting costs and revenue by ensuring data reliability, consistency, and understandability.

Data Warehousing

  • Definition: A data warehouse is a large database that collates business information from various sources.
  • Function: Supports management decision-making and involves data extraction, transformation, and loading (ETL).
  • Data Sources: Internal operations, external data, social networks, and clickstream data.

Data Marts and Data Lakes

  • Data Mart: A subset of data from a warehouse tailored for small- to medium-sized businesses or specific departments.
  • Data Lake: A vast repository holding all types of data in raw format, allowing users to extract and transform data as needed when conducting analyses.

Data Warehouses vs. Data Marts

  • Data warehouses contain comprehensive data suitable for large-scale decision support, while data marts offer specialized data for specific departments or functions.