Salas 2 (170)

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/173

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 11:16 PM on 4/15/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

174 Terms

1
New cards

How is experimental design defined within the strategic framework of AI development? It is the strategic architecture of discovery, providing a systematic approach to purposely changing input variables to observe and identify reasons for changes in the output response.

2
New cards

Why does the text describe data as the "raw fuel" of every algorithm? Because the quality and structure of the input data determine the upper bound of an AI model's performance and its ability to generalize.

3
New cards

What is the fundamental distinction between a "data scraper" and a "data scientist"? A data scientist understands and applies the principles of experimental design to ensure data is controlled, relevant, and statistically valid.

4
New cards

What is the specific risk of training an AI model on uncontrolled hardware data? The resulting model will likely learn the stochastic noise of the hardware rather than the underlying logic of the physical problem.

5
New cards

In the context of instrumentation engineering, what is the primary goal of experimental design? To maximize the information gained about a system while minimizing the expenditure of resources like time and energy.

6
New cards

In the general experimental model Y = f (X) + ϵ, what does Y represent for an AI engineer? Y represents the dependent variable or the response measured, such as the steady-state error of a controller.

7
New cards

What do the X variables represent in the model Y = f (X) +ϵ ? They are the independent factors or input variables manipulated during the experiment, such as duty cycle or target setpoint.

8
New cards

How is ϵ characterized in the general model of physical systems? It represents experimental error or noise, which must be minimized and characterized to ensure model robustness.

9
New cards

What is the specific objective of an AI model regarding the function f in Y = f (X) + ϵ? The objective is to use machine learning to approximate the unknown function f that relates inputs to outputs.

10
New cards

In the Soft Pneumatic Actuator (SPA) characterization, what is the observed standard deviation of noise at a 100% duty cycle? The noise exhibits an intrinsic standard deviation of approximately 0.4 kPa.

11
New cards

Why is characterizing the standard deviation of hardware noise essential for AI training? It defines the "noise floor," helping engineers determine if an AI model is overfitting to random fluctuations or learning real signals.

12
New cards

What are the independent and dependent variables in the Raspberry Pi Liquid Temperature Controller? Independent variables include the liquid volume or setpoint, while dependent variables include the time to reach stability or the RMSE of the temperature.

13
New cards

How does proper experimental design impact the variance observed in a dataset? It ensures that the observed variance is a result of the factors being investigated rather than uncontrolled environmental fluctuations.

14
New cards

What is the consequence of failing to characterize ϵ through design? The AI model may fail in real-world scenarios due to a training set that incorporates confounded environmental variables.

15
New cards

What does the "dashed line" represent in the characterization scripts for the Soft Pneumatic Actuator? It represents the "ideal" or characterized system behavior, contrasting with the raw, scattered observations.

16
New cards

Why is rigor in experimental design considered a "computational challenge" for systems like the Bubble Flowmeter? Because every experimental run consumes finite energy and time, requiring high information density per observation.

17
New cards

How does a designed experiment affect the histogram of measurements compared to an uncontrolled one? A designed experiment produces a narrow, centered peak with high signal-to-noise ratio, whereas an uncontrolled one creates a wide, shallow, and inconsistent distribution.

18
New cards

What is the "Data Engineering from Hardware" perspective on data quality? Data is not created equal; only controlled data generation prevents an AI from learning hardware-specific biases.

19
New cards

How does experimental design assist in constructing robust models against physical stochasticity? By systematically isolating factors, it allows the model to learn the "scientific truth" beneath the inherent noise of sensors.

20
New cards

What is a comparative experiment in the context of control strategies? An experiment designed to determine if a significant difference exists between two or more treatments, such as On-Off versus PID control.

21
New cards

Which visualization summarizes the distribution of steady-state error using five key statistics? The boxplot, which displays the minimum, Q1, median, Q3, and maximum.

22
New cards

How is the interquartile range (IQR) used to identify hardware anomalies? It measures the spread of the middle 50% of data; observations more than 1.5 times the IQR from the quartiles are flagged as outliers.

23
New cards

What is the definition of a single-factor experiment? It is an investigation that manipulates one independent variable across multiple levels to observe its specific effect.

24
New cards

What advantage does a violin plot provide over a standard boxplot for AI engineers? It adds a Kernel Density Estimate (KDE) to show the probability density and identify bimodality in hardware responses.

25
New cards

What does bimodality in a violin plot suggest about a physical system? It suggests that a secondary, hidden process is affecting the hardware, requiring further investigation.

26
New cards

What is the defining characteristic of a factorial experiment? It allows for the simultaneous investigation of multiple factors and, crucially, their interactions.

27
New cards

How is a "synergistic effect" defined in a factorial design? It occurs when the combined effect of multiple factors is significantly different (often worse or better) than the sum of their individual effects.

28
New cards

Why are interactions critical for AI model generalization? If an AI is trained only on one-factor-at-a-time (OFAT) data, it will fail to predict the complex, non-linear behavior that occurs when factors combine.

29
New cards

How do laboratory experiments differ from field experiments in AI development? Laboratory experiments are highly controlled to minimize disturbances, while field experiments test models in the final, noisy environment like a hospital.

30
New cards

What role do computational experiments play in the AI development lifecycle? They allow for simulations where millions of runs can be executed to refine models before physical hardware testing.

31
New cards

What is the definition of an experimental unit? It is the smallest division of material or system components to which a treatment can be applied independently.

32
New cards

How is a sampling unit distinguished from an experimental unit? A sampling unit is a discrete observation or measurement taken from within an experimental unit.

33
New cards

In the thermal environment example, what is the experimental unit? The entire container of water assigned to a specific heater treatment.

34
New cards

Why are multiple sensors in a single water container considered sampling units? Because they share the same environment and their thermal trajectories are highly correlated, providing redundant information.

35
New cards

What is the statistical error of treating sampling units as independent experimental units? It leads to pseudoreplication, which artificially reduces the estimated experimental error and inflates confidence.

36
New cards

What is a common example of pseudoreplication in computer vision for AI? Treating 1,000 frames from a single 1-minute video as 1,000 independent data points rather than observations from one experimental event.

37
New cards

What happens to an AI model trained on pseudoreplicated data? It will show high accuracy on the validation set but will degrade significantly when deployed on different physical hardware instances.

38
New cards

Why is recognizing the true experimental unit essential for model generalization? It ensures the model learns the underlying physics rather than the unique biases or characteristics of a single experimental setup.

39
New cards

What is natural variation in an instrumentation system? The intrinsic, unavoidable fluctuation present in any physical process, such as environmental drift or electronic jitter.

40
New cards

What is induced variation? The change in response that is purposely created by the engineer through the manipulation of experimental factors.

41
New cards

How is a strip plot used to visualize hardware noise? It is a 1D scatter plot where points are "jittered" horizontally to reveal the density and spread of observations at specific levels.

42
New cards

What is systematic variation, and how does it manifest in sensors? It is a consistent bias or shift in measurements, such as a calibration error in a flowmeter.

43
New cards

What is random variation? Unpredictable, inconsistent fluctuations that represent the inherent noise floor of the system.

44
New cards

How is systematic bias visually identified in a Kernel Density Estimate (KDE) plot? By a horizontal shift of the entire distribution peak relative to the theoretical or calibrated value.

45
New cards

What is the danger of training an AI model on data with systematic bias? The model will consistently over-predict or under-predict, leading to potentially critical failures in control logic.

46
New cards

What are the "three pillars" that ensure the statistical validity of an experiment? Replication, randomization, and local control (blocking).

47
New cards

How does replication differ from repeated measurements? Replication involves repeating the basic experiment on entirely new, independent experimental units.

48
New cards

What is the primary statistical benefit of replication? It allows for the estimation of experimental error and improves the precision of the estimated effects.

49
New cards

How does the standard error of the mean change as the number of replications increases? The standard error decreases, which narrows the confidence interval and provides a stable target for model convergence.

50
New cards

What is randomization? The practice of assigning treatments to experimental units or determining the run order through a random process.

51
New cards

What does randomization protect against in hardware systems? It protects against confounding variables and temporal drifts, such as battery depletion or material fatigue.

52
New cards

How does randomization handle time-dependent effects statistically? It distributes these effects across all treatment levels, converting systematic bias into random noise.

53
New cards

What is the purpose of local control or blocking? To isolate and eliminate variability from known but irrelevant nuisance factors, such as differences between hardware boards.

54
New cards

How does blocking improve the precision of an AI training set? By mathematically isolating the variation of the block, it reduces the overall experimental error, making the signal clearer.

55
New cards

What does "i.i.d." stand for, and why is it important for AI? It stands for independent and identically distributed errors; randomization validates this assumption, which is required for most statistical tests.

56
New cards

What is confounding in an experiment? It occurs when the effect of an intended factor is indistinguishable from the effect of an uncontrolled variable, such as run order.

57
New cards

In the SPA, how can sequential testing lead to a biased model? Sequential testing may confound the power increment with the pump's internal temperature increase, leading to an incorrect slope.

58
New cards

How is randomization implemented in automated data acquisition? The control script (e.g., Python) generates a shuffled list of treatments to ensure the run sequence is non-linear.

59
New cards

What are the three main categories of experiments based on intent? Screening, characterization, and optimization.

60
New cards

What is the primary goal of a screening design? To identify which few factors have a dominant effect on the system among many potential candidates.

61
New cards

What is the Pareto principle in the context of factor screening? The concept that only a small number of factors are expected to have a dominant effect on the response.

62
New cards

How many levels are typically used in a screening experiment? Two widely spaced levels (e.g., Low and High) are used to capture the maximum potential effect.

63
New cards

What is the objective of characterization? To map the detailed functional relationship Y = f (X), often requiring more than two levels to detect non-linearity.

64
New cards

When are multi-level designs required in AI engineering? When the relationship between factors and response is expected to be non-linear or contain complex gradients.

65
New cards

What is the goal of optimization? To find the specific combination of factor levels that yields the peak performance, such as maximum efficiency or minimum error.

66
New cards

What is a "response surface"? A mapping used in optimization to visualize the peaks or valleys of hardware performance.

67
New cards

What is treatment design? The process of selecting the specific values, range, and granularity of independent variables to be tested.

68
New cards

What is meant by the "range" in treatment design? The mathematical difference between the minimum and maximum levels of a factor.

69
New cards

What is meant by "granularity" in treatment design? The number of discrete levels tested within a specific range.

70
New cards

Why is extrapolation considered dangerous in AI model deployment? Because a model's validity is only guaranteed within the range of conditions it has observed during training.

71
New cards

What happens if treatment levels are clustered only in the linear region of a saturation curve? The model will fail to predict behavior at higher levels where the hardware naturally saturates.

72
New cards

What is the "operational envelope" of a system? The entire range of conditions and inputs under which the hardware is expected to function reliably.

73
New cards

Why should experiments include failure limits? To build robust decision boundaries that allow the AI to detect and handle hardware failures safely.

74
New cards

What are "anchor points" in treatment design? Data points at the absolute extremes of the operational range that prevent poor model extrapolation.

75
New cards

What is the risk of selecting factor levels that are too close together? The variation in response may be too small to distinguish from the intrinsic hardware noise.

76
New cards

What is the risk of selecting levels that are too far apart? The experiment might miss critical non-linearities or "valleys" occurring in the middle of the range.

77
New cards

In the SPA example, above what duty cycle do saturation effects typically occur? Above a 70% duty cycle.

78
New cards

What defines "Data Engineering from Hardware" for an AI engineer? The realization that data is the raw fuel of algorithms, but its quality depends on controlled generation.

79
New cards

How does experimental design separate a "data scraper" from a "data scientist"? By the ability to ensure that observed variance is due to investigated factors rather than noise.

80
New cards

In the model Y = f (X) + ϵ, what is the goal of AI regarding the function f? To approximate f as accurately as possible while being robust to ϵ.

81
New cards

What is the thermal environment example used to illustrate? The critical distinction between experimental units and sampling units.

82
New cards

What is the consequence of high noise in hardware for data acquisition? It significantly increases the cost and time required because more replications are needed to reach precision.

83
New cards

How does the Central Limit Theorem (CLT) benefit AI engineers? It allows the use of Gaussian-based statistical tools even when underlying hardware noise is non-normal.

84
New cards

What is the purpose of "interviewing the data" in EDA? To discover biases, limitations, and patterns without making prior assumptions about a statistical model.

85
New cards

What is the first step in any Exploratory Data Analysis? To visualize the distribution of the response variable to understand its shape and center.

86
New cards

What does a positive skew in a histogram indicate about data tailing? It indicates that the tail of the distribution extends toward higher values on the right.

87
New cards

How does skewness affect the relationship between the mean and median? In a skewed distribution, the mean is pulled toward the tail, away from the median.

88
New cards

Why is the divergence between mean and median critical for AI loss functions? Large differences suggest non-normality, which can make Mean Squared Error (MSE) overly sensitive to outliers.

89
New cards

What assumption do many ML algorithms make about noise distribution? They assume the noise follows a Normal (Gaussian) distribution.

90
New cards

What is a Quantile-Quantile (Q-Q) plot? A diagnostic tool that compares experimental quantiles against a theoretical normal distribution.

91
New cards

What does a straight 45-degree line in a Q-Q plot signify? It signifies that the experimental data perfectly follows a normal distribution.

92
New cards

What do deviations at the extremes of a Q-Q plot indicate to an engineer? The presence of heavy tails or extreme outliers, potentially caused by faulty sensors.

93
New cards

What are statistical parameters? Numerical summaries, such as mean and variance, that describe the behavior of a population or sample.

94
New cards

What does the mean (μ) represent in a physical hardware system? It represents the central value or the "strength" of the system's response.

95
New cards

What do variance (σ ) and standard deviation (σ) quantify? They quantify the dispersion, providing a direct measure of hardware reliability and precision.

96
New cards

How is skewness defined as a shape parameter? It is a measure of the asymmetry of the probability distribution.

97
New cards

Where is positive skewness commonly observed in intelligent systems? In latency measurements, where most runs are fast but a few take much longer than average.

98
New cards

What does kurtosis measure in a distribution? It measures the "tailedness," or the frequency of extreme outliers relative to a normal distribution.

99
New cards

How can kurtosis help identify faulty sensors? High kurtosis indicates "heavy tails," suggesting that the sensor is producing anomalous, extreme values more often than expected.

100
New cards

What is the difference between a population and a sample? A population is the entire set of possible observations; a sample is a subset used for inference.