Topic 6-- Data Management
Topic Six: Data Management, Databases, and Big Data
Pyramid of Content: A structured approach to content strategy and management.
Strategy & Governance: Establishing guidelines, policies, and frameworks for content creation, distribution, and maintenance.
Applications: Utilizing software and tools to manage content workflows, automate tasks, and track performance metrics.
Architecture & Infrastructure: Designing the systems and infrastructure necessary to store, organize, and deliver content effectively.
Data & Analytics: Collecting, analyzing, and interpreting data to inform content strategy, optimize performance, and measure ROI.
Dynamic Pricing
Dynamic Pricing: The practice of adjusting price points (premium, standard, discount) at different times based on demand, competition, and other market factors.
It allows businesses to maximize revenue by charging higher prices during peak demand and offering discounts during off-peak periods.
Examples include airlines and hotels adjusting prices based on booking times and availability.
Data
DIKW Pyramid: Represents the hierarchy of Data, Information, Knowledge, and Wisdom, illustrating how raw data evolves into actionable insights.
Wisdom: The application of knowledge to make informed decisions and increase effectiveness, representing the highest level of understanding.
Knowledge: Insights derived from experience and expertise, providing context and meaning to information.
Information: Data that has been processed, organized, and structured to provide context for decision-making.
Data: Raw facts and figures that serve as the foundation for information, knowledge, and wisdom.
Data Management
Data and databases are integral to various aspects of modern technology, including webpages, applications, devices, and machines.
Master Data Management (MDM): Involves ensuring that data is accessible, reliable, and timely, serving as a single source of truth for critical business information.
It requires establishing data governance policies, implementing data quality controls, and integrating data across different systems.
MDM Processes:
Acquiring: Gathering data from various sources, both internal and external.
Validating: Verifying the accuracy, completeness, and consistency of data.
Storing: Securely storing data in a centralized repository.
Protecting: Implementing security measures to safeguard data from unauthorized access or loss.
Processing: Transforming, cleaning, and enriching data to make it usable for analysis and decision-making.
Data Challenges: Data Quality issues, often referred to as Garbage In, Garbage Out (GIGO), can undermine the accuracy and reliability of insights derived from data.
Poor data quality can lead to flawed decision-making, operational inefficiencies, and increased costs.
Data volume doubles approximately every two years, creating challenges related to storage, processing, and analysis.
Data scientists often spend a significant amount of their time preparing data for analysis, highlighting the importance of data quality and efficient data management processes.
Database Management
Objectives of DBMS:
Independence: Ensuring that data is independent of the applications that use it, allowing for greater flexibility and adaptability.
Integrity: Maintaining the accuracy, consistency, and reliability of data through constraints, validation rules, and transaction management.
Capable of Changes: Providing mechanisms for modifying the database schema, adding new data elements, and updating existing data without disrupting applications.
Sharing: Enabling multiple users and applications to access and share data concurrently, while ensuring data consistency and security.
Security: Protecting data from unauthorized access, modification, or deletion through authentication, authorization, and encryption mechanisms.
Basic Operation of a DBMS: Involves interactions between applications, users, the DBMS (Database Management System), and different types of databases (Relational, Hierarchical) or Flat Files.
Human resources databases offer different views of data based on specific information requirements, such as employee profiles, salary information, and performance reviews.
SQL (Structured Query Language) is a widely used database language for querying, updating, and managing data in relational databases.
Organized data facilitates efficient creation, updating, accessing, and retrieval of information, improving decision-making and operational efficiency.
Relational Database
Most Common Type: Relational databases are the prevalent choice for storing and managing structured data due to their flexibility, scalability, and ease of use.
Analogy: MS Excel Workbook (Database) :: MS Excel Worksheet (Table), illustrating the organization of data in a structured format.
Relational databases utilize two-dimensional tables consisting of records (rows), fields (columns), and attributes (cells) to store and organize data.
Table: Represents data held in a structured table format with rows and columns.
Record: Refers to a labeled element representing a row in the table, containing related data.
Field: Indicates a labeled element representing a column in the table, defining the type of data stored.
Attribute: Represents a labeled element within a table cell, containing a specific value for a field in a record.
FIELD -> RECORD -> FILE -> DATABASE, illustrating the hierarchy of data organization.
Table Sharing: Facilitated through primary and foreign keys:
Primary Key (PK): Uniquely identifies a record within a table and is used to establish relationships between separate tables.
Foreign Key (FK): Identifies records in a related table, referencing the primary key of the original table.
Data Relationships
Cardinality and Modality define business rules governing relationships between entities in a database.
Cardinality: Specifies the maximum number of times an instance of a data entity can be associated with instances in a related entity.
Modality: Defines the minimum number of times an instance of a data entity must be associated with instances in a related entity.
Relationships:
One-to-One: Each instance of one entity is associated with exactly one instance of another entity.
One-to-More: Each instance of one entity can be associated with one or more instances of another entity.
None-to-One: An instance of one entity may not be associated with any instance of another entity, or it may be associated with exactly one instance.
None-to-More: An instance of one entity may not be associated with any instance of another entity, or it may be associated with one or more instances.
Data Relationship Symbols:
Zero or More:
30-One or More:
+One and Only One (Exactly 1):
#Zero or One:
+0-
Big Data
Big data analytics provides valuable insights that improve operations, enhance customer service, and optimize marketing strategies, resulting in increased revenue and profits.
5 Vs of Big Data:
Volume: Refers to the massive amount of data generated and stored.
Velocity: Signifies the speed at which data is generated and processed.
Variety: Encompasses the different types and formats of data, including structured, semi-structured, and unstructured data.
Veracity: Highlights the importance of data accuracy, reliability, and trustworthiness.
Value: Focuses on extracting meaningful insights and business value from big data.
Big Data is considered a valuable asset and capital for businesses, enabling them to gain a competitive edge and drive innovation.
Big Data Challenges:
Information often resides outside the company, requiring integration of external data sources.
Large, complex data sets pose challenges for storage, processing, and analysis.
Traditional software systems may not be capable of handling the volume, velocity, and variety of big data.
Requires advanced analytic models and techniques, such as artificial intelligence (AI) and machine learning (ML), to extract meaningful insights.
Big Data - Business Cases:
Agriculture: John Deere utilizes sensors and data analytics for precision farming, optimizing crop yields and resource utilization.
Aircraft: Predictive maintenance programs in the aviation industry leverage engine health-monitoring data to anticipate and prevent equipment failures.
Carbonated Soft Drinks: Coca-Cola employs data analytics for consumer insights and supply chain optimization, improving marketing effectiveness and operational efficiency.
Issues with Big Data:
Poor data quality can lead to inaccurate analysis, flawed decision-making, and wasted resources.
Key Take-Aways
Data management practices are essential for effectively managing data complexity and ensuring data quality.
Relational databases are a common, simple, fast, and versatile solution for handling vast amounts of structured data.
Big Data analytics can unlock valuable insights that drive revenue growth and improve profitability.
Key Terms:
Primary Key: A unique identifier for a record in a database table.
Foreign Key: A field in one table that refers to the primary key of another table, establishing a link between the tables.
Cardinality: The maximum number of instances of one entity that can be related to instances of another entity.
Modality: The minimum number of instances of one entity that must be related to instances of another entity.
Database Management System: Software used to manage and organize databases.
DIKW: Data, Information, Knowledge, Wisdom - a hierarchical model representing the progression from raw data to actionable insights.
Master Data Management: A comprehensive approach to managing, centralizing, organizing, categorizing, localizing, synchronizing, and sharing data across an organization.
Data Management: The set of processes involved in acquiring, validating, storing, protecting, and processing data.
Garbage In Garbage Out (GIGO): The principle that the quality of output is determined by the quality of the input data.
Database: A structured collection of data stored electronically.
Relational Database: A database that stores data in tables with predefined relationships between them.
Big Data: Extremely large and complex datasets characterized by volume, velocity, variety, veracity, and value.