Data Challenges: Organizations face challenges related to huge data growth, multiple data sources, data rot (data becoming obsolete or irrelevant), and ensuring security and compliance.
Data Governance
Master Data Management (MDM): A strategy to manage and maintain the accuracy and consistency of an organization's master data.
Relational Database Model
Reduces Issues: Aims to reduce data redundancy, isolation, and inconsistency.
Hierarchy: The structure of data organization:
Bit: The smallest unit of data, a binary digit (0 or 1).
Byte: A group of bits (typically 8) representing a character.
Field: A single piece of information (e.g., name, address).
File/Table: A collection of related records.
Database: A collection of related tables.
Key Terms:
Entity: A person, place, thing, or event about which information is maintained.
Instance: A specific occurrence of an entity.
Attribute: A characteristic or property of an entity.
Primary Key: A field that uniquely identifies each record in a table.
Foreign Key: A field in one table that refers to the primary key of another table, establishing a link between them.
Big Data
3 V's:
Volume: The sheer amount of data.
Velocity: The speed at which data is generated and processed.
Variety: The different types of data (structured, unstructured, semi-structured).
Issues: Challenges associated with big data include untrusted sources, "dirty" data (inaccurate or incomplete), and fast changes.
Use: Big data is used for analytics in various areas, including HR, marketing, operations, and creating new business models.
Tech: Technologies used include Relational Databases and NoSQL databases, as well as open-source tools.
Data Warehouses and Data Marts
Warehouse: A central repository of integrated data that is:
Subject-oriented: Organized around major subjects (e.g., customer, product).
Time-variant: Data is recorded with a time frame.
Non-volatile: Data is not altered or deleted, but appended.
Multidimensional: Data is structured in a way that allows for analysis from different perspectives.
Mart: A smaller, more focused data warehouse designed for a specific department or business unit.
Knowledge Management (KM)
Tacit vs. Explicit Knowledge:
Tacit Knowledge: Knowledge that is difficult to articulate or write down; it is often based on experience and intuition.
Explicit Knowledge: Knowledge that can be easily documented and shared.
KMS Cycle: A cycle of activities for managing knowledge:
Create: Generating new knowledge.
Capture: Documenting existing knowledge.
Refine: Improving and updating knowledge.
Store: Organizing and storing knowledge.
Manage: Maintaining and controlling knowledge.
Disseminate: Sharing knowledge with others.
Relational DB Fundamentals
SQL:
SELECT: Used to choose the columns you wish to view in the result.
WHERE: Used to filter records.
ER Modeling:
Entities: Real-world object or concept that can be distinctly identified.
Relationships: Association between entities.
Cardinality: Defines numerical attributes of the relationship (one-to-one, one-to-many, many-to-many).
Normalization:
1NF (First Normal Form): Eliminate repeating groups.
2NF (Second Normal Form): Eliminate redundant data (must be 1NF).
3NF (Third Normal Form): Eliminate transitive dependency (must be 2NF).