l
Big Data
Big data is a collection of large, complex data sets that are difficult to process using traditional tools.
Challenges include: capture, storage, search, sharing, transfer, analysis, and visualization.
Every day over bytes of data are created.
In 2025, it is predicted that 181 zettabytes of data will be created, captured, copied, and consumed globally.
What is considered "big data" varies depending on the organization's capabilities.
Sources of Big Data
Archives (Historical records)
Documents (e.g., Email, Word, PDF)
Data from business apps (ERP, CRM, HR)
Public data (Government websites)
Social media (Twitter, Facebook)
Machine log data (Call details, event logs)
Media (Images, audio, video)
Sensor data (Process control devices, smart meters)
Characteristics of Big Data (4Vs)
Volume: Amount of data.
Velocity: Speed at which data is created and stored.
Variety: Different forms of data (structured, semi-structured, unstructured).
Veracity: Quality and trustworthiness of data.
Challenges of Big Data
Choosing what data to store.
Where and how to store the data.
Finding relevant data for decision-making.
Deriving value from the data.
Protecting data from unauthorized access.
Data Integration and Data Warehousing
Data Rich, Information Poor: Organizations have a lot of data but lack processes to turn it into meaningful information.
Solution: Data Integration
Improves the quality of business decisions.
Enables reliable, consistent, understandable, and easily manipulated data for analysis.
Data Warehouse
A large database that collects business information from many sources to support management decision-making.
Data Sources:
Internal operations systems.
External data purchased from outside sources.
Data from social networking.
Clickstream data.
Data Marts and Data Lakes
Data mart: A subset of a data warehouse for decision-making in small to medium-sized businesses or departments.
Data lake: Stores all data in its raw, unaltered form.
Raw data available when needed for analysis.