KM02: Data Management and Interpretation Study Notes
Module Overview and Purpose. - KM0204:
Definitions of Terms and Concepts Associated with Data Management
Big Data: Large data sets analysed computationally to reveal patterns, trends, or associations.
Cloud Computing: A network of servers (local or distributed) allowing for scalability and increased computing power on an as-needed basis.
Common Data Elements (CDE): Standardized, precisely defined questions and allowable responses used systematically across sites or studies to ensure consistent data collection.
Data: Recorded factual material required to validate research findings; interpreted differently based on the field of study.
Data Lifecycle: The stages data passes through from initial creation to distribution, reuse, and eventually deletion.
Data Management Plan (DMP): A formal document determining how data is collected, processed, preserved, and used over its lifetime, including metadata standards and access policies.
Data Repository: A logical organization of data for researchers, often subject to specific domain or file format requirements.
Data Security: Measures (e.g., firewalls, strong passwords) to protect data from harm or unauthorized access during gathering, storage, and transmission.
Data Sharing: Making research data available to other investigators to promote transparency and scientific discovery.
Electronic Lab Notebook: Software replicating paper notebooks for entering protocols, notes, and observations electronically.
Metadata: Structured information about a resource that describes its context, making it easier to retrieve or manage for reproducibility.
Data Management (General Definition): The practice of collecting, organizing, and accessing data to support productivity, efficiency, and decision-making.
Importance and Benefits of Effective Data Management
Foundational Business Role: Data underpins every critical aspect of modern business, from customer experience to operations.
Key Enablers: - Informed Decision-Making: Ensures data is accurate and accessible so users can make confident, time-critical decisions. - Business Intelligence (BI) and Analytics: Provides the complete, high-quality data required for tools to gain insights into market trends and internal operations. - Operational Efficiency: Streamlines processes and increases employee productivity by allowing faster retrieval of info. - Regulatory Compliance: Reduces risk of penalties by adhering to complicated data protection laws and industry standards. - Competitive Advantage: Harnesses data to enhance innovation and customer satisfaction to outperform competitors.
Strategic Benefits: - Risk Management: Includes security, backup, and recovery to ensure business continuity. - Strategic Planning: Uses historical data to identify trends and forecast future market conditions. - Cost Savings: Eliminates redundant data and optimizes storage to reduce infrastructure costs. - Customer Satisfaction: Tailors offerings by gathering and analyzing customer preference data. - Data Monetization: High-quality data can be offered as products or insights to external parties.
Types of Data Management
Database Management: Uses RDBMS (Relational Database Management Systems) for structured data or NoSQL for flexible semi-structured data.
Master Data Management (MDM): Harmonizes critical business data (customer/product) to create a single authoritative source.
Document Management: Organizing and tracking electronic documents via a DMS.
Metadata Management: Managing the "data about data" to help users understand context.
Data Quality Management: Uses profiling, cleansing, and validation to ensure accuracy and completeness.
Data Governance: Establishes the policies, procedures, and roles (e.g., Data Stewards) to manage data assets.
Data Security Management: Focuses on confidentiality, integrity, and availability (CIA).
Data Integration: Combines data from different sources via ETL (Extract, Transform, Load) processes.
Data Warehousing: Centralizes large volumes of data to support BI and reporting.
Big Data Management: Deals with massive volumes in distributed computing environments.
Data Lifecycle Management: Manages data from creation through storage and archiving to disposal.
Data Management Strategy Development Steps
Define Business Objectives: Understand goals and the specific data required to support them.
Assess Current State: Evaluate existing infrastructure and capabilities.
Establish Governance Framework: Define policies, roles, and responsibilities.
Create Data Inventory: Categorize all assets based on usage and sensitivity.
Ensure Data Quality: Implement tools for profiling, cleansing, and validation.
Enhance Security: Implement encryption, access controls, and audits.
Implement Integration Architecture: Design for scalability and interoperability.
Foster Data-Driven Culture: Promote data literacy and collaboration between IT and business teams.
Invest in Scalable Technology: Select tools that align with business goals.
Establish Lifecycle Management: Define plans for archiving and disposal.
Monitor Performance: Set KPIs to measure strategy effectiveness.
Continuous Improvement: Update strategies based on new regulations or technologies.
Sources and Collection Processes for HR Data
Human Resources Information System (HRIS) Sources: - Recruiting Data: Captured via ATS (Applicant Tracking Systems); includes funnel metrics and sources. - Demographic Data: Found in employee records (ID, gender, position, department). - Performance Management Data: Reviews and ratings within a PMS. - Learning Management Data: Training progress and course offerings in an LMS. - Job Architecture: Definitions of salary scales, bands, and grades. - Compensation & Benefits: Details on salary, bonuses, and secondary benefits. - Succession Planning: Leadership development data and bench strength metrics. - Exit Interview Data: Reasons for turnover to analyze retention.
Secondary/Other Sources: - Business Data: CRM data (customer satisfaction/NPS), Sales data (per store/department), Financial data (ROI for training, cost per person). - Other HR Data: Mentoring challenges/outcomes, engagement surveys, wellness initiatives (work-life balance).
Validity Principles in HR Data Collection: - Content Validity: Ensuring instruments cover all relevant aspects of the skill being assessed. - Criterion Validity: Checking if data correlates with performance outcomes (e.g., correlating evaluation scores with actual sales figures). - Construct Validity: Measuring underlying theoretical attributes correctly. - Expert Review: Feedback from HR subject matter experts. - Pilot Testing: Identifying issues before full-scale implementation.
Gartner's ABCD Framework for HR Data Quality: - Accuracy: Is every detail correct? Can the data be trusted? - Breadth: Is the data complete or are there missing gaps? - Consistency: Is the standard applied uniformly across formats and methodologies? - Depth: Is the data granular enough to target specific business units or individuals?
Data Preparation and Cleaning
Statistics on Prep Time: Data scientists spend 45\% 60\% of their time collecting and organizing data, with cleaning taking about a quarter of their day.
Key Steps in Preparation: 1. Define Objectives: Determine the specific business questions to be answered. 2. Collect Data: Use reliable sources (APIs, web scraping, databases). 3. Clean and Validate: Correct errors, outliers (extreme values), and missing values. Validating against assumptions. 4. Organize and Structure: Arrange data logically for tool compatibility. 5. Transform and Enrich: Aggregate, normalize, or add features (e.g., geolocation, sentiment analysis). 6. Explore and Visualize: Discover patterns via descriptive statistics (Mean, Median, Mode). 7. Document and Share: Provide metadata using standards like Dublin Core or JSON-LD.
Basic Data Analysis and Statistical Methods
Descriptive Statistics: Tools used to organize and summarize info so results can be communicated.
Measures of Central Tendency: - Mean: The average calculated by summing all values and dividing by the total count. - Median: The middle value in an ordered list. - Mode: The most frequently occurring value.
Measures of Dispersion: - Range: Maximum value minus minimum value. - Variance: Measures how far each number is from the mean. - Standard Deviation: A measure of how number are spread out from the mean.
Graphical Methods: - Frequency Distributions: Summary of frequencies of different values; can use tables or histograms. - Bar Charts: Used for qualitative (categorical) data. Avoid "3D" effects which cause distortion. - Histograms: Graphic versions of frequency distributions for quantitative data (contiguous bars). - Scatter Plots: Show relationships between two variables. - Box Plots: Portray differences between distributions.
The Challenger Disaster Case Study (1986): - Illustrates the critical importance of data visualization. - Engineers had data but presented it as hand-written slides with numbers. - Edward Tufte argued that a plot showing O-ring damage against forecasted cold temperatures would have been more persuasive in stopping the launch.
The "Lie Factor": - Defined by Edward Tufte as the ratio of the size of the effect shown in a graph to the size of the effect in the data. - Acceptable range for the Lie Factor is between 0.951.05.
HR Information Systems (HRIS)
Components of an Information System: - Hardware: Physical tech (computers, keyboards, iPads, storage from the cloud). - Software: System software (Operating Systems like Windows) and Application software (task-specific like Excel). - Data: Raw facts that become powerful when organized. - Telecommunications: Connections via wired (fiber optics, coaxial) or wireless (radio waves, microwaves) modes.
System Workflow: 1. Input: Collecting raw data via typing, touch, or sensors. 2. Processing: CPU converts data into structured format (sorting, analyzing). 3. Storage: Temporary or permanent (hard disks, SSDs, databases). 4. Output: Reports and dashboards. 5. Feedback: Evaluating user experience and system efficiency.
Types of HRIS by Function: - Operational: Automates day-to-day HR tasks (payroll, basic data management). - Tactical: Focuses on recruitment, development, and training to assist middle management decision-making. - Strategic: High-level analytics for personnel deployment, goal setting, and long-term planning. - Employee/Manager Self-Service (ESS/MSS): Allows individuals to update personal info, request time off, or approve team requests autonomously.
HR Metrics and People Analytics
HR Metrics Definition: Quantitative measurements used to assess value and effectiveness of HR initiatives.
Metric Examples: - Cost per Hire: Total recruiting costs divided by total hires. - Time to Fill: Total days from job opening to offer acceptance. - Revenue per Employee: \text{Total Revenue} / \text{Number of Employees}. - Absenteeism Rate: (\text{Number of absent days} / \text{Total working days}) \times 100. - Early Turnover Rate: Percentage of new recruits leaving in the first year (indicates hiring mismatch).
People Analytics: Also called Talent or Workforce Analytics. A goal-focused method to solve work problems using statistics.
9 Box Grid: A tool to measure and map individual performance against potential in three levels (underperformers, reliable team players, high potentials, exceptional talent).
eNPS (Employee Net Promoter Score): Measures how likely employees are to recommend the organization as a place to work on a scale of 110$$.
ROI for Training: The financial gain an organization realizes from a specific training program.
Ethical Information Management
Business Ethics: Moral principles, policies, and values governing business conduct beyond legal requirements.
Legislative Frameworks (South Africa): - POPIA (Protection of Personal Information Act): Regulates lawful processing of personal data and privacy rights. - ECTA (Electronic Communications and Transactions Act): Legal framework for electronic transactions and data integrity. - LRA (Labour Relations Act): Guidelines for handling employee info within employment contracts and disciplinary records.
Professional Code of Conduct (5 Pillars): 1. Integrity. 2. Objectivity. 3. Competence. 4. Confidentiality. 5. Professionalism.
10 Principles for Successful Info Management Projects: 1. Recognize and manage complexity. 2. Focus on adoption (change management). 3. Deliver tangible and visible benefits. 4. Prioritize based on business needs. 5. "Journey of a thousand steps" (many small updates rather than one massive project). 6. Provide strong leadership. 7. Mitigate risks (early pilots). 8. Communicate extensively. 9. Strive for a seamless DEX (Digital Employee Experience). 10. Choose the first project very carefully (must be a catalyst).
Effective Business Communication and Reporting
Communication Directions: - Upward: Subordinate to manager (feedback, suggestions). - Downward: Supervisor to subordinate (policy, goals, delegation). - Lateral: Between peers of equal rank (collaboration, coordination). - External: Between organization and outside parties (clients, vendors).
Report Elements: - ToR (Terms of Reference): Defines what the report is about, why it is necessary, and its purpose. - Findings: Presentation of data interpretation. - Recommendations: Suggested courses of action based on evidence.
Effective Writing Steps: 1. Decide on Terms of Reference. 2. Conduct Research. 3. Create an Outline (Title page, Intro, Findings, etc.). 4. Write First Draft (don't worry about perfection). 5. Analyze Data and Record Findings. 6. Recommend Course of Action. 7. Edit and Distribute.