AI Implementation Best Practices and Methodologies - CPMAI

Key Concepts

  • AI is not simply about application development; its core is enabling systems to learn and evolve from data.

  • Differences Between AI Projects and Traditional Software Development

    • Functionality vs. Data Focus: Traditional software emphasizes delivering specific functionalities (e.g., mobile app or web app). AI projects hinge on leveraging data to create learning systems, requiring a data-centric project management approach.

    • Learning Mechanism: AI systems (especially ML-based) improve with data exposure; they start with basic capabilities and improve as they learn. Effectiveness is dynamic and requires ongoing adjustments.

  • Common Mistakes in AI Projects

    • Misplaced Focus on Functionality: Stakeholders focus on delivering features rather than addressing underlying data requirements, risking AI that technically works but misses business objectives or user needs.

    • Data-Centric Approach: Treat AI projects as data projects; success hinges on data management:

    • Data Collection: Gather relevant data from multiple sources, representative of the problem.

    • Data Cleansing: Ensure data accuracy, consistency, and error-free quality.

    • Data Preparation: Transform data for analysis and model training (normalization, encoding categorical variables, creating training/test datasets).

  • Common Failure Reasons

    • AI projects differ from traditional software in approach and lifecycle.

    • ROI justification may be weak or unclear.

    • Data quantity and quality issues can derail projects.

    • PoC traps or failure to transition to real-world use cases.

    • Real-world conditions often differ from the model; project lifecycles are continuous, not finite.

    • Vendors may hype product capabilities leading to product mismatch, overhype, and oversell.

Project Lifecycle & Continuous Improvement

  • Continuous Iteration

    • AI projects require ongoing iteration rather than a finite end; models must be updated and retrained as data and requirements evolve.

    • Mindset of continuous improvement: regular assessments of model performance and adjustments as new data arrives or requirements change.

  • Real-World Dynamics

    • Data and models drift and decay over time.

    • Failure can occur from: not budgeting for maintenance, not accounting for drift-related issues, or not preparing for data/environment changes.

  • Return on Investment (ROI)

    • ROI can be assessed via:

    • Cost Savings: reductions in operational costs through automation or efficiency gains.

    • Time Savings: higher efficiency enabling human focus on higher-value tasks.

    • Resource Efficiency: better use of compute and human expertise.

  • Critical Questions for AI Projects

    • What problem are we trying to solve? Define the business problem clearly to align with organizational goals.

    • Should this problem be solved with AI? Assess whether AI is the most effective solution; not every problem requires AI.

    • What skills and resources are necessary? Identify data scientists, engineers, domain experts, data and computing power needs.

    • What is the expected ROI? Establish success metrics and how they will be measured across the lifecycle.

  • Examples of AI Project Failures

    • Walmart Shelf Scanning Robots: robots failed to outperform human restocking/inventory management; emphasizes evaluating real-world practicality.

    • Amazon AI Recruiting Tool: bias against women due to training on historical data lacking female representation; highlights need for diverse, representative data.

PoC vs Pilot Projects

  • Proof of Concept (PoC)

    • Demonstrates feasibility in a controlled environment but may fail to transition to real-world use due to idealized data.

    • Best practices: PoCs should be short (ideally a few weeks) and hypothesis-focused to quickly assess viability.

  • Pilot Projects

    • Real-world implementation testing under actual operating conditions with real data and users.

    • Example: Pilot of an AI-driven customer service chatbot in a limited geographic area to gather feedback before full rollout.

Data Quality & Quantity

  • Importance of Data

    • High-quality data is critical; the adage "garbage in, garbage out" applies: poor data yields poor models.

    • Sufficient data quantity and diversity are required to train robust models.

  • Data Quality Issues & Questions

    • Common data quality questions:

    • What is the overall quality of your data?

    • Do you have enough of the right kind of data?

    • Do you need augmentation or enhancement of data?

    • What are the ongoing data gathering and preparation requirements?

    • What technology is needed for data manipulation/transformation?

  • Data Quantity Issues (common failures)

    • Not understanding how much data is needed for the AI project.

    • Not understanding which data types are required.

    • Not identifying internal and external data sources and data environments.

  • Common Data Issues

    • Lack of Understanding: Underestimating data needs leads to inadequate training data and poor generalization.

    • Bias in Data: Underrepresentation of groups leads to biased outcomes; diverse data sources are necessary to mitigate bias.

  • Real World vs Model

    • The model must meet real-world requirements (accuracy, precision, etc.) and fit the operational approach.

    • Ongoing monitoring, iteration, and versioning are required.

Vendor Hype, Overpromising, and User Trust

  • Vendor Hype: Common issues include product mismatch, overhype, and oversell.

    • Questions to ask: Does the product fit your needs? Have you done independent research?

    • Risk: Failing to ask the right questions; overreliance on vendor claims.

  • Overpromising and Underdelivering

    • Key questions:

    • What problem are you solving?

    • Why tackle the hardest problem first?

    • Why tackle many AI patterns at once?

    • Motto: Think big, start small, iterate often.

  • User Experience & Trust Management

    • Uncanny Valley: systems knowing too much personal data can provoke negative user reactions (data uncanny valley, timestamped reference at 52:39).

    • Privacy vs. convenience: balance personalization with user comfort levels.

    • Transparency vs. security: open systems can reduce security while users want explainable AI decisions.

    • Varying user acceptance of data sharing and behavior knowledge; surveillance concerns arise when data collection crosses into invasive monitoring.

    • Trust-building strategies and gradual introduction to maintain positive user relationships during AI evolution.

    • Case references: Henna Hotel robot failure leading to project cancellation (timestamp 56:05); 61% of US consumers uncomfortable with robots (Brookings Institution) indicating need for careful UX design and gradual adoption (timestamp around 56:30–57:00).

    • Virtual assistant misinterpretations (e.g., snoring interpreted as help requests) caused user disruption; some luggage-moving robots delivered ROI without interaction issues; human touch emerges as preferred approach in some contexts.

Agile Methodologies in AI Projects

  • Adapting Agile for Data Projects

    • Agile can improve responsiveness and efficiency by enabling iterative development and continuous feedback.

    • A data-centric, Agile approach combines data management with iterative delivery; traditional waterfall fails due to failure to account for continuous data changes.

  • Key Agile Principles for AI

    • Iterative development with small repetition tasks enables ongoing refinement as data and requirements evolve.

    • Emphasize that about 80\% of the effort is data-related and 20\% is application/functionality, requiring data management focus rather than pure coding.

    • Fast iteration enables easier testing, debugging, parallel development, and rapid user feedback.

    • Data-centric approaches are necessary due to data's continuous changes, quality issues, inconsistencies, and governance/privacy concerns beyond traditional apps.

  • Agile Roles & Structure for Data Projects

    • Data Product Owner: understands the complete data lifecycle from collection and ingestion to preparation, transformation, and consumption (01:17:41).

    • Data Scrum Master: understands the data lifecycle's impact on project timelines and supports specialized data roles.

    • Development Team Expansion: includes data scientists, data engineers, ETL specialists, BI analysts, and data governance owners (rather than traditional developers).

    • Broader Stakeholder Inclusion: analysts, data scientists, end users, governance, legal, compliance, and privacy teams.

    • Time-Boxed Iterations: deliverables must be produced within short timeframes despite data complexity; require efficiency improvements and issue resolution.

  • Key Agile Practices

    • Sprints: short, time-boxed periods for focused work on project components.

    • Daily Standups: regular status checks to address blockers and maintain alignment.

  • Drawbacks of Agile for Data Projects

    • Limited Documentation: rapid iteration can lead to insufficient documentation critical for compliance and future reference.

    • Fragmented Output: teams may deliver disparate data products rather than a cohesive application.

    • No Defined End: new data and needs continuously emerge; governance and data systems evolve.

    • Difficulty in Measuring Results: unclear ROI or KPIs for analytics outputs can hinder evaluation.

  • CRISP-DM vs CPMAI Methodologies

    • CRISP-DM (Cross-Industry Standard Process for Data Mining): an established data mining framework guiding data-centric projects with six phases; emphasizes understanding business objectives before data work.

    • CPMAI (Cognitive Project Management for AI): an updated, vendor-neutral methodology designed for AI projects; combines Agile principles with AI-specific processes; data-first, AI-relevant, highly iterative; focuses on operational success and six phases.

CRISP-DM and CPMAI Methodologies

  • CRISP-DM Overview

    • A data-centric methodology with six phases:

    • Phase 1: Business Understanding – understand objectives, requirements, and goals.

    • Phase 2: Data Understanding – identify data needed; data collection; data quality assessment.

    • Phase 3: Data Preparation – tasks to prepare data for modeling (cleaning, transforming, organizing).

    • Phase 4: Modeling – select algorithms and modeling approaches; build models; possibly iterate.

    • Phase 5: Evaluation – test model and ensure alignment with business objectives; decide on deployment.

    • Phase 6: Deployment – deploy the model in the real world and monitor behavior.

    • Note: CRISP-DM has not been updated since its initial release in 1999 and predates modern agile/AI extensions.

  • CPMAI (Cognitive Project Management for AI)

    • Vendor-neutral, best-practice methodology for AI/ML/advanced analytics and cognitive projects of any size.

    • Characteristics:

    • Data-first

    • AI-relevant

    • Highly iterative

    • Focused on the right tasks for operational success

    • Six CPMAI Phases:

    • Phase I – Business Understanding: define business problem and project goals; ensure alignment with organizational objectives.

    • Phase II – Data Understanding: gather and explore data necessary for the project; assess data quality; deliverables include data requirements, data sources, data quality understanding, data environment.

    • Phase III – Data Preparation: clean, transform, and structure data for modeling; deliverables include data collection/ingestion, data cleaning, data preparation, data labeling & annotation.

    • Phase IV – Modeling: develop and train the AI model; deliverables include algorithm selection, model training, model tuning, ensemble creation.

    • Phase V – Evaluation: assess model performance against business goals; include model validation/testing, performance checks, retraining as needed; deliverables include model validation, performance checks, retraining results.

    • Phase VI – Operationalization: deploy the model in production; monitor and maintain performance; continuous iteration and improvement; deliverables include deployment, model optimization, governance, and clarity on where the model is used.

CPMAI Phase Details (Phase I–VI)

  • Phase I – Business Understanding

    • Objective: Define the business problem and project goals to ensure alignment with organizational objectives.

    • Key Activities:

    • Identify Business Goals: Clarify problems to solve and expected benefits.

    • Define Success Criteria: Establish measurable outcomes and KPIs (e.g., increased efficiency, cost savings, customer satisfaction).

    • Conduct Feasibility Analysis: Assess AI viability—data availability, resources, potential ROI.

    • Deliverables:

    • AI business requirements

    • AI Go/No-Go decision

    • Ethical and responsible requirements

    • KPIs

  • Phase II – Data Understanding

    • Objective: Gather and explore data necessary for the project to ensure suitability for analysis.

    • Key Activities:

    • Data Collection: Identify and collect data from internal/external sources and real-time feeds.

    • Data Exploration: Assess structure, quality, relevance, patterns, anomalies.

    • Assess Data Quality: Check for missing values, duplicates, inconsistencies.

    • Deliverables:

    • Data requirements

    • Data sources

    • Data quality understanding

    • Data environment

  • Phase III – Data Preparation

    • Objective: Prepare data for modeling by cleaning, transforming, and structuring it.

    • Key Activities:

    • Data Cleaning: Remove duplicates, fill missing values, correct errors.

    • Data Transformation: Normalize, encode categoricals, create features.

    • Data Splitting: Create training, validation, and test sets.

    • Deliverables:

    • Data collection and ingestion

    • Data cleaning

    • Data preparation

    • Data labeling & annotation

  • Phase IV – Modeling

    • Objective: Develop and train the AI model with prepared data.

    • Key Activities:

    • Select Modeling Techniques: Choose algorithms based on problem type (classification, regression, clustering).

    • Model Training: Train model on training dataset.

    • Hyperparameter Tuning: Optimize performance via hyperparameters to reduce overfitting.

    • Deliverables:

    • Algorithm selection

    • Model training

    • Model tuning

    • Ensemble creation

  • Phase V – Evaluation

    • Objective: Assess model performance against business goals and success criteria.

    • Key Activities:

    • Model Testing: Evaluate on test data; measure accuracy, precision, recall, etc.

    • Validation Against Business Objectives: Ensure alignment; revisit earlier phases if needed.

    • Documentation: Record evaluation results and limitations.

    • Deliverables:

    • Model validation and testing

    • Checking model performance

    • Retraining until desired accuracy

  • Phase VI – Operationalization

    • Objective: Deploy model into production to deliver real-world value.

    • Key Activities:

    • Model Deployment: Integrate with existing systems/workflows.

    • Monitoring and Maintenance: Continuously track performance; detect degradation.

    • Iteration and Improvement: Retrain/update models with new data as needed.

    • Deliverables:

    • Determining where the model will be used

    • Production use with ongoing monitoring

    • Model optimization

    • Model governance

Data Projects vs. Application Development Projects

  • Data Projects

    • Do not start with functionality; focus on insights or actions to derive from data.

    • Continuously change and evolve; face issues of data quality and representation.

    • Data-centric issues include control, security, governance, and privacy concerns.

  • Application Development Projects

    • Typically centered on delivering features with defined endpoints and requirements.

Challenges in Agile Data Projects

  • Planning Difficulties

    • Evolving data and requirements make cost/resource predictions challenging; maintain flexibility.

    • Establish clear milestones/deliverables while allowing adjustments.

  • Limited Documentation

    • Agile focus on rapid delivery can reduce essential documentation for compliance and future reference.

  • Fragmented Outputs

    • Individual data projects may produce disjointed outputs; ensure alignment with broader goals and avoid duplication.

  • No Defined End

    • Data systems and governance continuously evolve; cannot assume a fixed end state.

  • Difficult to Measure Results

    • Quantifying ROI and KPIs for analytics outputs can be challenging without clear metrics.

Conclusion

  • Understanding the unique challenges and methodologies for managing AI projects is crucial for success.

  • Emphasizing a data-centric approach and adapting Agile practices can improve outcomes and align AI solutions with business needs.

  • Continuous learning and iteration are essential for adapting to changing environments and maintaining competitive advantage.

  • A culture of collaboration, transparency, and ongoing improvement maximizes the potential of AI initiatives and drives meaningful results.