AI Implementation Best Practices and Methodologies - CPMAI
Key Concepts
AI is not simply about application development; its core is enabling systems to learn and evolve from data.
Differences Between AI Projects and Traditional Software Development
Functionality vs. Data Focus: Traditional software emphasizes delivering specific functionalities (e.g., mobile app or web app). AI projects hinge on leveraging data to create learning systems, requiring a data-centric project management approach.
Learning Mechanism: AI systems (especially ML-based) improve with data exposure; they start with basic capabilities and improve as they learn. Effectiveness is dynamic and requires ongoing adjustments.
Common Mistakes in AI Projects
Misplaced Focus on Functionality: Stakeholders focus on delivering features rather than addressing underlying data requirements, risking AI that technically works but misses business objectives or user needs.
Data-Centric Approach: Treat AI projects as data projects; success hinges on data management:
Data Collection: Gather relevant data from multiple sources, representative of the problem.
Data Cleansing: Ensure data accuracy, consistency, and error-free quality.
Data Preparation: Transform data for analysis and model training (normalization, encoding categorical variables, creating training/test datasets).
Common Failure Reasons
AI projects differ from traditional software in approach and lifecycle.
ROI justification may be weak or unclear.
Data quantity and quality issues can derail projects.
PoC traps or failure to transition to real-world use cases.
Real-world conditions often differ from the model; project lifecycles are continuous, not finite.
Vendors may hype product capabilities leading to product mismatch, overhype, and oversell.
Project Lifecycle & Continuous Improvement
Continuous Iteration
AI projects require ongoing iteration rather than a finite end; models must be updated and retrained as data and requirements evolve.
Mindset of continuous improvement: regular assessments of model performance and adjustments as new data arrives or requirements change.
Real-World Dynamics
Data and models drift and decay over time.
Failure can occur from: not budgeting for maintenance, not accounting for drift-related issues, or not preparing for data/environment changes.
Return on Investment (ROI)
ROI can be assessed via:
Cost Savings: reductions in operational costs through automation or efficiency gains.
Time Savings: higher efficiency enabling human focus on higher-value tasks.
Resource Efficiency: better use of compute and human expertise.
Critical Questions for AI Projects
What problem are we trying to solve? Define the business problem clearly to align with organizational goals.
Should this problem be solved with AI? Assess whether AI is the most effective solution; not every problem requires AI.
What skills and resources are necessary? Identify data scientists, engineers, domain experts, data and computing power needs.
What is the expected ROI? Establish success metrics and how they will be measured across the lifecycle.
Examples of AI Project Failures
Walmart Shelf Scanning Robots: robots failed to outperform human restocking/inventory management; emphasizes evaluating real-world practicality.
Amazon AI Recruiting Tool: bias against women due to training on historical data lacking female representation; highlights need for diverse, representative data.
PoC vs Pilot Projects
Proof of Concept (PoC)
Demonstrates feasibility in a controlled environment but may fail to transition to real-world use due to idealized data.
Best practices: PoCs should be short (ideally a few weeks) and hypothesis-focused to quickly assess viability.
Pilot Projects
Real-world implementation testing under actual operating conditions with real data and users.
Example: Pilot of an AI-driven customer service chatbot in a limited geographic area to gather feedback before full rollout.
Data Quality & Quantity
Importance of Data
High-quality data is critical; the adage "garbage in, garbage out" applies: poor data yields poor models.
Sufficient data quantity and diversity are required to train robust models.
Data Quality Issues & Questions
Common data quality questions:
What is the overall quality of your data?
Do you have enough of the right kind of data?
Do you need augmentation or enhancement of data?
What are the ongoing data gathering and preparation requirements?
What technology is needed for data manipulation/transformation?
Data Quantity Issues (common failures)
Not understanding how much data is needed for the AI project.
Not understanding which data types are required.
Not identifying internal and external data sources and data environments.
Common Data Issues
Lack of Understanding: Underestimating data needs leads to inadequate training data and poor generalization.
Bias in Data: Underrepresentation of groups leads to biased outcomes; diverse data sources are necessary to mitigate bias.
Real World vs Model
The model must meet real-world requirements (accuracy, precision, etc.) and fit the operational approach.
Ongoing monitoring, iteration, and versioning are required.
Vendor Hype, Overpromising, and User Trust
Vendor Hype: Common issues include product mismatch, overhype, and oversell.
Questions to ask: Does the product fit your needs? Have you done independent research?
Risk: Failing to ask the right questions; overreliance on vendor claims.
Overpromising and Underdelivering
Key questions:
What problem are you solving?
Why tackle the hardest problem first?
Why tackle many AI patterns at once?
Motto: Think big, start small, iterate often.
User Experience & Trust Management
Uncanny Valley: systems knowing too much personal data can provoke negative user reactions (data uncanny valley, timestamped reference at 52:39).
Privacy vs. convenience: balance personalization with user comfort levels.
Transparency vs. security: open systems can reduce security while users want explainable AI decisions.
Varying user acceptance of data sharing and behavior knowledge; surveillance concerns arise when data collection crosses into invasive monitoring.
Trust-building strategies and gradual introduction to maintain positive user relationships during AI evolution.
Case references: Henna Hotel robot failure leading to project cancellation (timestamp 56:05); 61% of US consumers uncomfortable with robots (Brookings Institution) indicating need for careful UX design and gradual adoption (timestamp around 56:30–57:00).
Virtual assistant misinterpretations (e.g., snoring interpreted as help requests) caused user disruption; some luggage-moving robots delivered ROI without interaction issues; human touch emerges as preferred approach in some contexts.
Agile Methodologies in AI Projects
Adapting Agile for Data Projects
Agile can improve responsiveness and efficiency by enabling iterative development and continuous feedback.
A data-centric, Agile approach combines data management with iterative delivery; traditional waterfall fails due to failure to account for continuous data changes.
Key Agile Principles for AI
Iterative development with small repetition tasks enables ongoing refinement as data and requirements evolve.
Emphasize that about 80\% of the effort is data-related and 20\% is application/functionality, requiring data management focus rather than pure coding.
Fast iteration enables easier testing, debugging, parallel development, and rapid user feedback.
Data-centric approaches are necessary due to data's continuous changes, quality issues, inconsistencies, and governance/privacy concerns beyond traditional apps.
Agile Roles & Structure for Data Projects
Data Product Owner: understands the complete data lifecycle from collection and ingestion to preparation, transformation, and consumption (01:17:41).
Data Scrum Master: understands the data lifecycle's impact on project timelines and supports specialized data roles.
Development Team Expansion: includes data scientists, data engineers, ETL specialists, BI analysts, and data governance owners (rather than traditional developers).
Broader Stakeholder Inclusion: analysts, data scientists, end users, governance, legal, compliance, and privacy teams.
Time-Boxed Iterations: deliverables must be produced within short timeframes despite data complexity; require efficiency improvements and issue resolution.
Key Agile Practices
Sprints: short, time-boxed periods for focused work on project components.
Daily Standups: regular status checks to address blockers and maintain alignment.
Drawbacks of Agile for Data Projects
Limited Documentation: rapid iteration can lead to insufficient documentation critical for compliance and future reference.
Fragmented Output: teams may deliver disparate data products rather than a cohesive application.
No Defined End: new data and needs continuously emerge; governance and data systems evolve.
Difficulty in Measuring Results: unclear ROI or KPIs for analytics outputs can hinder evaluation.
CRISP-DM vs CPMAI Methodologies
CRISP-DM (Cross-Industry Standard Process for Data Mining): an established data mining framework guiding data-centric projects with six phases; emphasizes understanding business objectives before data work.
CPMAI (Cognitive Project Management for AI): an updated, vendor-neutral methodology designed for AI projects; combines Agile principles with AI-specific processes; data-first, AI-relevant, highly iterative; focuses on operational success and six phases.
CRISP-DM and CPMAI Methodologies
CRISP-DM Overview
A data-centric methodology with six phases:
Phase 1: Business Understanding – understand objectives, requirements, and goals.
Phase 2: Data Understanding – identify data needed; data collection; data quality assessment.
Phase 3: Data Preparation – tasks to prepare data for modeling (cleaning, transforming, organizing).
Phase 4: Modeling – select algorithms and modeling approaches; build models; possibly iterate.
Phase 5: Evaluation – test model and ensure alignment with business objectives; decide on deployment.
Phase 6: Deployment – deploy the model in the real world and monitor behavior.
Note: CRISP-DM has not been updated since its initial release in 1999 and predates modern agile/AI extensions.
CPMAI (Cognitive Project Management for AI)
Vendor-neutral, best-practice methodology for AI/ML/advanced analytics and cognitive projects of any size.
Characteristics:
Data-first
AI-relevant
Highly iterative
Focused on the right tasks for operational success
Six CPMAI Phases:
Phase I – Business Understanding: define business problem and project goals; ensure alignment with organizational objectives.
Phase II – Data Understanding: gather and explore data necessary for the project; assess data quality; deliverables include data requirements, data sources, data quality understanding, data environment.
Phase III – Data Preparation: clean, transform, and structure data for modeling; deliverables include data collection/ingestion, data cleaning, data preparation, data labeling & annotation.
Phase IV – Modeling: develop and train the AI model; deliverables include algorithm selection, model training, model tuning, ensemble creation.
Phase V – Evaluation: assess model performance against business goals; include model validation/testing, performance checks, retraining as needed; deliverables include model validation, performance checks, retraining results.
Phase VI – Operationalization: deploy the model in production; monitor and maintain performance; continuous iteration and improvement; deliverables include deployment, model optimization, governance, and clarity on where the model is used.
CPMAI Phase Details (Phase I–VI)
Phase I – Business Understanding
Objective: Define the business problem and project goals to ensure alignment with organizational objectives.
Key Activities:
Identify Business Goals: Clarify problems to solve and expected benefits.
Define Success Criteria: Establish measurable outcomes and KPIs (e.g., increased efficiency, cost savings, customer satisfaction).
Conduct Feasibility Analysis: Assess AI viability—data availability, resources, potential ROI.
Deliverables:
AI business requirements
AI Go/No-Go decision
Ethical and responsible requirements
KPIs
Phase II – Data Understanding
Objective: Gather and explore data necessary for the project to ensure suitability for analysis.
Key Activities:
Data Collection: Identify and collect data from internal/external sources and real-time feeds.
Data Exploration: Assess structure, quality, relevance, patterns, anomalies.
Assess Data Quality: Check for missing values, duplicates, inconsistencies.
Deliverables:
Data requirements
Data sources
Data quality understanding
Data environment
Phase III – Data Preparation
Objective: Prepare data for modeling by cleaning, transforming, and structuring it.
Key Activities:
Data Cleaning: Remove duplicates, fill missing values, correct errors.
Data Transformation: Normalize, encode categoricals, create features.
Data Splitting: Create training, validation, and test sets.
Deliverables:
Data collection and ingestion
Data cleaning
Data preparation
Data labeling & annotation
Phase IV – Modeling
Objective: Develop and train the AI model with prepared data.
Key Activities:
Select Modeling Techniques: Choose algorithms based on problem type (classification, regression, clustering).
Model Training: Train model on training dataset.
Hyperparameter Tuning: Optimize performance via hyperparameters to reduce overfitting.
Deliverables:
Algorithm selection
Model training
Model tuning
Ensemble creation
Phase V – Evaluation
Objective: Assess model performance against business goals and success criteria.
Key Activities:
Model Testing: Evaluate on test data; measure accuracy, precision, recall, etc.
Validation Against Business Objectives: Ensure alignment; revisit earlier phases if needed.
Documentation: Record evaluation results and limitations.
Deliverables:
Model validation and testing
Checking model performance
Retraining until desired accuracy
Phase VI – Operationalization
Objective: Deploy model into production to deliver real-world value.
Key Activities:
Model Deployment: Integrate with existing systems/workflows.
Monitoring and Maintenance: Continuously track performance; detect degradation.
Iteration and Improvement: Retrain/update models with new data as needed.
Deliverables:
Determining where the model will be used
Production use with ongoing monitoring
Model optimization
Model governance
Data Projects vs. Application Development Projects
Data Projects
Do not start with functionality; focus on insights or actions to derive from data.
Continuously change and evolve; face issues of data quality and representation.
Data-centric issues include control, security, governance, and privacy concerns.
Application Development Projects
Typically centered on delivering features with defined endpoints and requirements.
Challenges in Agile Data Projects
Planning Difficulties
Evolving data and requirements make cost/resource predictions challenging; maintain flexibility.
Establish clear milestones/deliverables while allowing adjustments.
Limited Documentation
Agile focus on rapid delivery can reduce essential documentation for compliance and future reference.
Fragmented Outputs
Individual data projects may produce disjointed outputs; ensure alignment with broader goals and avoid duplication.
No Defined End
Data systems and governance continuously evolve; cannot assume a fixed end state.
Difficult to Measure Results
Quantifying ROI and KPIs for analytics outputs can be challenging without clear metrics.
Conclusion
Understanding the unique challenges and methodologies for managing AI projects is crucial for success.
Emphasizing a data-centric approach and adapting Agile practices can improve outcomes and align AI solutions with business needs.
Continuous learning and iteration are essential for adapting to changing environments and maintaining competitive advantage.
A culture of collaboration, transparency, and ongoing improvement maximizes the potential of AI initiatives and drives meaningful results.