Data Management Notes

Managing Data Challenges

  • Data Challenges: Organizations face challenges related to huge data growth, multiple data sources, data rot (data becoming obsolete or irrelevant), and ensuring security and compliance.

Data Governance

  • Master Data Management (MDM): A strategy to manage and maintain the accuracy and consistency of an organization's master data.

Relational Database Model

  • Reduces Issues: Aims to reduce data redundancy, isolation, and inconsistency.
  • Hierarchy: The structure of data organization:
    • Bit: The smallest unit of data, a binary digit (0 or 1).
    • Byte: A group of bits (typically 8) representing a character.
    • Field: A single piece of information (e.g., name, address).
    • File/Table: A collection of related records.
    • Database: A collection of related tables.
  • Key Terms:
    • Entity: A person, place, thing, or event about which information is maintained.
    • Instance: A specific occurrence of an entity.
    • Attribute: A characteristic or property of an entity.
    • Primary Key: A field that uniquely identifies each record in a table.
    • Foreign Key: A field in one table that refers to the primary key of another table, establishing a link between them.

Big Data

  • 3 V's:
    • Volume: The sheer amount of data.
    • Velocity: The speed at which data is generated and processed.
    • Variety: The different types of data (structured, unstructured, semi-structured).
  • Issues: Challenges associated with big data include untrusted sources, "dirty" data (inaccurate or incomplete), and fast changes.
  • Use: Big data is used for analytics in various areas, including HR, marketing, operations, and creating new business models.
  • Tech: Technologies used include Relational Databases and NoSQL databases, as well as open-source tools.

Data Warehouses and Data Marts

  • Warehouse: A central repository of integrated data that is:
    • Subject-oriented: Organized around major subjects (e.g., customer, product).
    • Time-variant: Data is recorded with a time frame.
    • Non-volatile: Data is not altered or deleted, but appended.
    • Multidimensional: Data is structured in a way that allows for analysis from different perspectives.
  • Mart: A smaller, more focused data warehouse designed for a specific department or business unit.

Knowledge Management (KM)

  • Tacit vs. Explicit Knowledge:
    • Tacit Knowledge: Knowledge that is difficult to articulate or write down; it is often based on experience and intuition.
    • Explicit Knowledge: Knowledge that can be easily documented and shared.
  • KMS Cycle: A cycle of activities for managing knowledge:
    • Create: Generating new knowledge.
    • Capture: Documenting existing knowledge.
    • Refine: Improving and updating knowledge.
    • Store: Organizing and storing knowledge.
    • Manage: Maintaining and controlling knowledge.
    • Disseminate: Sharing knowledge with others.

Relational DB Fundamentals

  • SQL:
    • SELECT: Used to choose the columns you wish to view in the result.
    • WHERE: Used to filter records.
  • ER Modeling:
    • Entities: Real-world object or concept that can be distinctly identified.
    • Relationships: Association between entities.
    • Cardinality: Defines numerical attributes of the relationship (one-to-one, one-to-many, many-to-many).
  • Normalization:
    • 1NF (First Normal Form): Eliminate repeating groups.
    • 2NF (Second Normal Form): Eliminate redundant data (must be 1NF).
    • 3NF (Third Normal Form): Eliminate transitive dependency (must be 2NF).
  • Joins: Combine tables based on related columns.