Topic 6-- Data Management

Topic Six: Data Management, Databases, and Big Data

  • Pyramid of Content: A structured approach to content strategy and management.

    • Strategy & Governance: Establishing guidelines, policies, and frameworks for content creation, distribution, and maintenance.

    • Applications: Utilizing software and tools to manage content workflows, automate tasks, and track performance metrics.

    • Architecture & Infrastructure: Designing the systems and infrastructure necessary to store, organize, and deliver content effectively.

    • Data & Analytics: Collecting, analyzing, and interpreting data to inform content strategy, optimize performance, and measure ROI.

Dynamic Pricing

  • Dynamic Pricing: The practice of adjusting price points (premium, standard, discount) at different times based on demand, competition, and other market factors.

    • It allows businesses to maximize revenue by charging higher prices during peak demand and offering discounts during off-peak periods.

    • Examples include airlines and hotels adjusting prices based on booking times and availability.

Data

  • DIKW Pyramid: Represents the hierarchy of Data, Information, Knowledge, and Wisdom, illustrating how raw data evolves into actionable insights.

    • Wisdom: The application of knowledge to make informed decisions and increase effectiveness, representing the highest level of understanding.

    • Knowledge: Insights derived from experience and expertise, providing context and meaning to information.

    • Information: Data that has been processed, organized, and structured to provide context for decision-making.

    • Data: Raw facts and figures that serve as the foundation for information, knowledge, and wisdom.

Data Management

  • Data and databases are integral to various aspects of modern technology, including webpages, applications, devices, and machines.

  • Master Data Management (MDM): Involves ensuring that data is accessible, reliable, and timely, serving as a single source of truth for critical business information.

    • It requires establishing data governance policies, implementing data quality controls, and integrating data across different systems.

  • MDM Processes:

    • Acquiring: Gathering data from various sources, both internal and external.

    • Validating: Verifying the accuracy, completeness, and consistency of data.

    • Storing: Securely storing data in a centralized repository.

    • Protecting: Implementing security measures to safeguard data from unauthorized access or loss.

    • Processing: Transforming, cleaning, and enriching data to make it usable for analysis and decision-making.

  • Data Challenges: Data Quality issues, often referred to as Garbage In, Garbage Out (GIGO), can undermine the accuracy and reliability of insights derived from data.

    • Poor data quality can lead to flawed decision-making, operational inefficiencies, and increased costs.

  • Data volume doubles approximately every two years, creating challenges related to storage, processing, and analysis.

  • Data scientists often spend a significant amount of their time preparing data for analysis, highlighting the importance of data quality and efficient data management processes.

Database Management

  • Objectives of DBMS:

    1. Independence: Ensuring that data is independent of the applications that use it, allowing for greater flexibility and adaptability.

    2. Integrity: Maintaining the accuracy, consistency, and reliability of data through constraints, validation rules, and transaction management.

    3. Capable of Changes: Providing mechanisms for modifying the database schema, adding new data elements, and updating existing data without disrupting applications.

    4. Sharing: Enabling multiple users and applications to access and share data concurrently, while ensuring data consistency and security.

    5. Security: Protecting data from unauthorized access, modification, or deletion through authentication, authorization, and encryption mechanisms.

  • Basic Operation of a DBMS: Involves interactions between applications, users, the DBMS (Database Management System), and different types of databases (Relational, Hierarchical) or Flat Files.

  • Human resources databases offer different views of data based on specific information requirements, such as employee profiles, salary information, and performance reviews.

  • SQL (Structured Query Language) is a widely used database language for querying, updating, and managing data in relational databases.

  • Organized data facilitates efficient creation, updating, accessing, and retrieval of information, improving decision-making and operational efficiency.

Relational Database

  • Most Common Type: Relational databases are the prevalent choice for storing and managing structured data due to their flexibility, scalability, and ease of use.

  • Analogy: MS Excel Workbook (Database) :: MS Excel Worksheet (Table), illustrating the organization of data in a structured format.

  • Relational databases utilize two-dimensional tables consisting of records (rows), fields (columns), and attributes (cells) to store and organize data.

    1. Table: Represents data held in a structured table format with rows and columns.

    2. Record: Refers to a labeled element representing a row in the table, containing related data.

    3. Field: Indicates a labeled element representing a column in the table, defining the type of data stored.

    4. Attribute: Represents a labeled element within a table cell, containing a specific value for a field in a record.

  • FIELD -> RECORD -> FILE -> DATABASE, illustrating the hierarchy of data organization.

  • Table Sharing: Facilitated through primary and foreign keys:

    1. Primary Key (PK): Uniquely identifies a record within a table and is used to establish relationships between separate tables.

    2. Foreign Key (FK): Identifies records in a related table, referencing the primary key of the original table.

Data Relationships

  • Cardinality and Modality define business rules governing relationships between entities in a database.

    • Cardinality: Specifies the maximum number of times an instance of a data entity can be associated with instances in a related entity.

    • Modality: Defines the minimum number of times an instance of a data entity must be associated with instances in a related entity.

  • Relationships:

    1. One-to-One: Each instance of one entity is associated with exactly one instance of another entity.

    2. One-to-More: Each instance of one entity can be associated with one or more instances of another entity.

    3. None-to-One: An instance of one entity may not be associated with any instance of another entity, or it may be associated with exactly one instance.

    4. None-to-More: An instance of one entity may not be associated with any instance of another entity, or it may be associated with one or more instances.

  • Data Relationship Symbols:

    • Zero or More: 30-

    • One or More: +

    • One and Only One (Exactly 1): #

    • Zero or One: +0-

Big Data

  • Big data analytics provides valuable insights that improve operations, enhance customer service, and optimize marketing strategies, resulting in increased revenue and profits.

  • 5 Vs of Big Data:

    1. Volume: Refers to the massive amount of data generated and stored.

    2. Velocity: Signifies the speed at which data is generated and processed.

    3. Variety: Encompasses the different types and formats of data, including structured, semi-structured, and unstructured data.

    4. Veracity: Highlights the importance of data accuracy, reliability, and trustworthiness.

    5. Value: Focuses on extracting meaningful insights and business value from big data.

  • Big Data is considered a valuable asset and capital for businesses, enabling them to gain a competitive edge and drive innovation.

  • Big Data Challenges:

    • Information often resides outside the company, requiring integration of external data sources.

    • Large, complex data sets pose challenges for storage, processing, and analysis.

    • Traditional software systems may not be capable of handling the volume, velocity, and variety of big data.

    • Requires advanced analytic models and techniques, such as artificial intelligence (AI) and machine learning (ML), to extract meaningful insights.

  • Big Data - Business Cases:

    • Agriculture: John Deere utilizes sensors and data analytics for precision farming, optimizing crop yields and resource utilization.

    • Aircraft: Predictive maintenance programs in the aviation industry leverage engine health-monitoring data to anticipate and prevent equipment failures.

    • Carbonated Soft Drinks: Coca-Cola employs data analytics for consumer insights and supply chain optimization, improving marketing effectiveness and operational efficiency.

  • Issues with Big Data:

    • Poor data quality can lead to inaccurate analysis, flawed decision-making, and wasted resources.

Key Take-Aways

  • Data management practices are essential for effectively managing data complexity and ensuring data quality.

  • Relational databases are a common, simple, fast, and versatile solution for handling vast amounts of structured data.

  • Big Data analytics can unlock valuable insights that drive revenue growth and improve profitability.

  • Key Terms:

    • Primary Key: A unique identifier for a record in a database table.

    • Foreign Key: A field in one table that refers to the primary key of another table, establishing a link between the tables.

    • Cardinality: The maximum number of instances of one entity that can be related to instances of another entity.

    • Modality: The minimum number of instances of one entity that must be related to instances of another entity.

    • Database Management System: Software used to manage and organize databases.

    • DIKW: Data, Information, Knowledge, Wisdom - a hierarchical model representing the progression from raw data to actionable insights.

    • Master Data Management: A comprehensive approach to managing, centralizing, organizing, categorizing, localizing, synchronizing, and sharing data across an organization.

    • Data Management: The set of processes involved in acquiring, validating, storing, protecting, and processing data.

    • Garbage In Garbage Out (GIGO): The principle that the quality of output is determined by the quality of the input data.

    • Database: A structured collection of data stored electronically.

    • Relational Database: A database that stores data in tables with predefined relationships between them.

    • Big Data: Extremely large and complex datasets characterized by volume, velocity, variety, veracity, and value.