Chapter 8 MAIO

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/34

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 5:08 PM on 6/4/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

35 Terms

1
New cards

What are the 4 main learning goals of the Managing AI lecture?

Building a framework for managing AI projects, organizing AI project teams, analyzing AI system deployment considerations, and executing AI system monitoring/lifecycle management.

2
New cards

What are the 3 major functional branches under a project team in a typical AI organization structure?

Product (Product Owner/Manager), Data Science (Data Scientist), and Engineering (Data, Software, and ML Engineers).

3
New cards

List the specific engineering roles within an AI project team.

Data Engineer, Software Engineer, and ML Engineer (supported directly by QA and DevOps functions).

4
New cards

What are the 4 primary challenges that distinguish AI projects from traditional software?

They are probabilistic rather than deterministic, carry high technical risk, require significant up-front data preparation, and necessitate intensive workflow change management to build user trust.

5
New cards

What does CRISP-DM stand for and when was it developed?

CRoss Industry Standard Process for Data Mining; developed in 1996 by a European consortium of companies as a flexible, industry-agnostic data methodology.

6
New cards

List the 6 sequential stages of the CRISP-DM process.

  1. Business Understanding, 2. Data Understanding, 3. Data Preparation, 4. Modeling, 5. Evaluation, and 6. Deployment.
7
New cards

What 3 milestones define the Business Understanding stage of CRISP-DM?

1.1 Define the Problem, 1.2 Define Success (translating business impact into metrics and constraints), and 1.3 Identify Factors.

8
New cards

What steps are required to effectively "Define the Problem" under CRISP-DM?

Identify the target user, write a clear problem statement, clarify why it matters, explain how it is solved today, map gaps in the current state, and qualify expected business impacts.

9
New cards

What are the 3 milestones within the Data Understanding stage of CRISP-DM?

2.1 Gather Data (identifying sources, labeling, feature creation), 2.2 Validate Data (quality control, resolving missing values or outliers), and 2.3 Explore the Data.

10
New cards

Contrast the data inputs required for model training versus live model prediction.

Training requires historical data consisting of both Features and Labels; Prediction requires real-time data consisting only of Features to generate final Model Outputs.

11
New cards

What 4 factors influence the data volume and quality requirements of an AI model?

The number of unique features, the complexity of feature-target relationships, the presence of missing or noisy data, and the overall desired level of model performance.

12
New cards

What specific tasks are executed during the Data Preparation stage of CRISP-DM?

Splitting data into training and test sets, performing feature engineering and selection, encoding categorical features, scaling or standardizing data, and resolving class imbalances.

13
New cards

What are the 2 sub-stages of the Modeling phase in CRISP-DM?

4.1 Model Selection (evaluating algorithms via cross-validation and documenting versioning) and 4.2 Model Tuning (hyperparameter optimization and re-training).

14
New cards

What 3 decision models were built for the TBô community-led menswear case study?

DM1 (Analyzing how purchasing behavior affects co-creation), DM2 (Evaluating if co-creators differ from non-co-creators in purchasing), and DM3 (Identifying why customers do not place repeat orders).

15
New cards

What were the key behavioral findings from TBô's Decision Model 1 (DM1)?

The number of orders placed and limited-edition products bought have positive non-linear effects on co-creation probability, while time elapsed since the last purchase exerts a negative non-linear effect.

16
New cards

What did Decision Model 2 (DM2) reveal about co-creators versus non-co-creators at TBô?

Co-creators account for a higher mean cumulative value of purchases in USD on average, and this specific financial difference increases over longer sales time windows.

17
New cards

What top 3 salient topics emerged from TBô's customer text modeling (DM3) regarding non-repeat buying?

Topic #1: Having no need to buy, Topic #2: Expensiveness, and Topic #3: Dissatisfaction with the service.

18
New cards

What dual steps are required during the Evaluation phase of CRISP-DM?

5.1 Evaluation Results (scoring models on test sets, interpreting performance, and running software unit/integration tests) and 5.2 Test Solution (model testing against directional expectations and executing user tests).

19
New cards

What is the definition of an Outcome Metric in AI project evaluation?

A metric that captures the desired business impact on the organization or customer (frequently stated in dollars), strictly excluding technical model performance metrics.

20
New cards

Name 4 testing and validation strategies applied to an AI solution during the evaluation phase.

Hindsight scenario testing, A/B testing, Beta testing, and tracking progress on core business metrics.

21
New cards

What dual operations compose the Deployment phase of CRISP-DM?

6.1 System Deploy (API frameworks, product integration, scaling infrastructure, and security execution) and 6.2 System Monitor (performance tracking and model retraining loops).

22
New cards

What is MLOps?

Machine Learning Operations — a collaborative engineering function (comprising data scientists, DevOps, and IT) focused on streamlining model production deployment, monitoring, and maintenance.

23
New cards

Contrast the hidden and ongoing cost profiles of On-Premises vs. Cloud Computing for ML infrastructure.

On-Premises features lower upfront software licenses (9%) but heavy ongoing burdens (downtime fixes, network/security upgrades, database maintenance); Cloud relies on higher ongoing subscription fees (68%) that cover implementation and training.

24
New cards

What 3 interconnected pillars form the system deployment optimization trade-off?

Latency, Cost, and Throughput.

25
New cards

Distinguish between Optimizing Metrics and Satisficing Metrics in production models.

Optimizing metrics are parameters targeted for maximum improvement (e.g., Accuracy, Precision, Recall); Satisficing metrics are gating constraints that must simply meet a minimum acceptable standard (e.g., Latency, Model Size, GPU load).

26
New cards

Contrast Cloud ML and Edge ML architectures across network dependencies and performance.

Cloud ML executes computation on the cloud, requiring constant network connectivity but offering high throughput (e.g., ChatGPT); Edge ML computes directly on-device, providing low latency, high privacy, and offline functionality (e.g., Tesla autopilot, face unlock).

27
New cards

What is Model Decay?

The inevitable downward drift of a model's operational performance over time when it transitions from a controlled testing environment into a live production ecosystem.

28
New cards

What is a Covariate Shift in production data?

A data shift where the distribution of independent input variables changes between training and production ($P_{\text{train}}(X) \neq P_{\text{prod}}(X)$), while the conditional relationship to the target remains identical ($P_{\text{train}}(Y|X) = P_{\text{prod}}(Y|X)$).

29
New cards

What is a Prior Probability Shift in production data?

A data shift where the distribution of the dependent target/output variable changes ($P_{\text{train}}(Y) \neq P_{\text{prod}}(Y)$), while the conditional relationship between inputs and outputs remains completely stable ($P_{\text{train}}(Y|X) = P_{\text{prod}}(Y|X)$).

30
New cards

What is Concept Drift in live machine learning systems?

A shift over time where the actual real-world relationship and decision boundary between independent variables and dependent variables changes, causing an un-updated model to incorrectly predict outcomes.

31
New cards

What 4 elements must be logged and monitored to identify production model failure?

Input Data (quality, distributions, feature correlations), Data Pipelines (pre vs. post-processed distributions), Model Outputs (predicted vs. observed labels over time), and Model Auditing (fairness, explainable AI).

32
New cards

Why is Model Retraining necessary for deployed systems?

To update the model with new data to improve performance, counteract data/concept drift from a changing environment, reduce adversarial vulnerabilities, and prioritize recent data relevance.

33
New cards

Describe Triggered Retraining under model decay monitoring.

A continuous process where a model threshold ($\tau$) is set on an evaluation metric (like the F-1 measure); whenever the active performance score drops below $\tau$, a retraining pipeline is automatically triggered to launch a new model version.

34
New cards

What is Shadow Releasing in model deployment?

A strategy where live data is routed to both the active Production Model and a new Retrained Model simultaneously; performance is logged for both, but only the production model results face the end user until the retrained model proves superior.

35
New cards

What is Champion-Challenger Testing?

An infrastructure deployment pattern where a primary live data stream is fed simultaneously to one operational "Champion" model (which serves users) and multiple "Challenger" models to continuously log and compare competitive performance.