Data and Knowledge Management
Learning Objectives
Discuss ways that common challenges in managing data can be addressed using data governance.
Identify and assess the advantages and disadvantages of relational databases.
Define Big Data and explain its basic characteristics.
Explain the elements necessary to successfully implement and maintain data warehouses.
Describe the benefits and challenges of implementing knowledge management systems in organizations.
Understand the processes of querying a relational database, entity-relationship modeling, normalization, and joins.
Multiple Sources of Data
Internal Sources: Corporate databases, company documents.
Personal Sources: Personal thoughts, opinions, experiences.
External Sources: Commercial databases, government reports, corporate websites, clickstream data.
Database Management System (DBMS)
A set of programs that provide users with tools to create and manage databases.
Key Problems Addressed by DBMS
Data Redundancy: Same data stored in multiple locations.
Data Isolation: Applications cannot access data associated with other applications.
Data Inconsistency: Various copies of data do not agree.
DBMS Maximizes
Data Security: High-security measures required to protect data.
Data Integrity: Ensuring data meets certain constraints (e.g., no alphabetic characters in a Social Security field).
Data Independence: Applications and data can operate independently.
Difficulties of Managing Data
Exponential data growth over time.
Data siloing.
New data sources and outdated data.
Data rot, security, quality, integrity, and government regulations.
The challenge of handling unstructured data and Big Data.
Data Governance
An organization-wide approach to managing information.
Master Data Management: Strategy that involves processes for maintaining and synchronizing core data across the organization.
Master Data: Core data sets spanning enterprise systems (e.g., customer, product).
Data Hierarchy
Bit: Smallest unit of data (0 or 1).
Byte: Group of 8 bits (represents a character).
Field: Column of related characters.
Record: Group of related fields in a row.
Data File: Logical grouping of related records (like a table).
Database: Grouping of related data files.
Relational Database Model
Based on two-dimensional tables with records (rows) and attributes (columns).
Entity: A person, place, thing, or event (e.g., a customer).
Instance of an Entity: Each row in a table representing a unique entity.
Attribute: Characteristics of entities.
Primary Key: Uniquely identifies each record.
Foreign Key: Links tables by identifying a row in another table.
Big Data
Definition: Diverse, high-volume, high-velocity information requiring new processing forms to enhance decision-making.
Characteristics:
Volume: Large amounts of data.
Velocity: Rapid data flow into organizations.
Variety: Many formats of data (e.g., images, text).
Issues: Untrusted sources, data cleanliness, changing data streams.
Applications: Used across HR, product development, operations, marketing, and government.
Data Warehouses and Data Marts
Data Warehouse: Repository of historical data organized by subject for decision support.
Data Mart: Scaled-down version of a data warehouse created for specific departmental needs.
Characteristics of Data Warehouses and Marts
Organized by business dimension.
Utilize Online Analytical Processing (OLAP).
Integrated from multiple systems around subjects.
Maintain historical data, user-accessible, and non-volatile.
Use a multidimensional data structure.
Knowledge Management (KM)
KM Process: Manipulating important knowledge within organizations to optimize actions.
Explicit Knowledge: Objective and documented knowledge (e.g., strategies, reports).
Tacit Knowledge: Subjective experiences and insights.
Knowledge Management Systems (KMS): Technologies to enhance knowledge processes.
The Knowledge Management System Cycle
Create Knowledge: Development of new ideas.
Capture Knowledge: Identifying and representing valuable knowledge.
Refine Knowledge: Contextualizing knowledge for action.
Store Knowledge: Placing knowledge in a repository.
Manage Knowledge: Keeping knowledge current and relevant.
Disseminate Knowledge: Making knowledge available to all in the organization.
Fundamentals of Relational Database Operations
Relational Database: Collection of interrelated tables (rows and columns).
Query Languages:
SQL: Standard for database interaction, enables complex searches.
QBE: User fills out templates to describe data needs.
Entity-Relationship Modelling
Entity-Relationship (ER) Modelling: Planning and creating databases using diagrams that outline entities and relationships.
Business Rules: Policies guiding data use.
Data Dictionary: Describes attributes within the database.
Cardinality: Maximum number of entity associations in relationships.
Normalization and Joins
Normalization: Streamlining a database to minimize redundancy and enhance integrity.
Functional Dependencies: Attribute association analyzing.
Join Operation: Combines records from multiple tables to obtain relevant information.
The quiz will cover material from the current lecture deck and readings, consisting of 12 multiple-choice questions.
Quiz timing: 8:10 AM - arrive before 8:15 AM.
Value: 3% of overall course grade; missed quizzes score 0 with no makeups allowed.