ISYE 6501 - Intro to Data Analytics

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/96

flashcard set

Earn XP

Description and Tags

Midterm 2

Last updated 9:36 PM on 4/9/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

97 Terms

1
New cards
2
New cards
Overfitting
Model fits noise; low training error, high validation error
3
New cards
Underfitting
Model too simple; high training and validation error
4
New cards
Bias
Systematic error from overly simple model
5
New cards
Variance
Model sensitivity to data; high variance = overfitting
6
New cards
Forward Selection
Start with no variables, add significant ones (p ≤ threshold)
7
New cards
Backward Elimination
Start with all variables, remove insignificant ones (p > threshold)
8
New cards
Stepwise Regression
Combination of forward and backward selection
9
New cards
Lasso Regression
Sets some coefficients exactly to 0 (feature selection)
10
New cards
Ridge Regression
Shrinks coefficients to reduce variance (no variable removal)
11
New cards
Elastic Net
Combination of Lasso and Ridge penalties
12
New cards
Regularization Effect
Decreasing regularization → lower bias, higher variance
13
New cards
A/B Testing
Compare two alternatives using binary outcomes
14
New cards
Balanced Design
Each option tested equally often
15
New cards
Factorial Design
Test combinations of multiple factors
16
New cards
Fractional Factorial Design
Subset of combinations; faster but may confound effects
17
New cards
Exploration
Testing options to gain information
18
New cards
Exploitation
Using best-known option for immediate reward
19
New cards
Multi-Armed Bandit
Balances exploration and exploitation
20
New cards
Blocking
Control unwanted variation by grouping similar units
21
New cards
Bernoulli Distribution
Single trial with success probability p
22
New cards
Binomial Distribution
Number of successes in n independent trials
23
New cards
Geometric Distribution
Number of failures before first success
24
New cards
Poisson Distribution
Number of events in a fixed interval
25
New cards
Exponential Distribution
Time between events
26
New cards
Weibull Distribution
Time to failure (flexible hazard rate)
27
New cards
Poisson vs Exponential
Poisson = counts; Exponential = time between events
28
New cards
Memoryless Property
Future independent of past (Exponential, Geometric only)
29
New cards
Q-Q Plot
Straight line = data matches distribution; curve = mismatch
30
New cards
Deterministic Simulation
No randomness; same input → same output
31
New cards
Stochastic Simulation
Includes randomness
32
New cards
Discrete-Event Simulation
Changes occur at discrete times
33
New cards
Simulation Replications
Run multiple times to capture variability
34
New cards
Simulation Quality
Depends on accuracy of input data
35
New cards
Markov Chain
Future depends only on current state (memoryless)
36
New cards
Transition Probability (Pij)
Probability of moving from state i to j
37
New cards
Missing Data First Step
Check for patterns in missingness
38
New cards
Missing Data Options
Remove, indicator variable, or impute
39
New cards
Mean Imputation
Replace missing with mean; introduces bias
40
New cards
Regression Imputation
Predict missing values; risk of overfitting
41
New cards
Perturbation
Add noise to imputed values to restore variability
42
New cards
Imputation Risk
Using same data twice can cause overfitting
43
New cards
Optimization Variables
Decisions to be made
44
New cards
Optimization Constraints
Restrictions on variables
45
New cards
Objective Function
Measure to maximize or minimize
46
New cards
Feasible Solution
Satisfies all constraints
47
New cards
Optimal Solution
Best feasible solution
48
New cards
Convex Optimization
Guarantees global optimum
49
New cards
Non-Convex Optimization
May have multiple local optima
50
New cards
Integer Programming
Variables restricted to integers; harder to solve
51
New cards
Optimization Difficulty Order
Linear < Convex < Integer < Non-convex
52
New cards
Heuristic Algorithm
Fast, good solution but not guaranteed optimal
53
New cards
Nonparametric Tests
Do not assume distribution; use ranks
54
New cards
McNemar Test
Paired categorical (Yes/No)
55
New cards
Wilcoxon Signed-Rank Test
Paired numeric data
56
New cards
Mann-Whitney Test
Independent numeric samples
57
New cards
Parametric vs Nonparametric
Parametric uses means; nonparametric uses ranks
58
New cards
Bayes Theorem
P(A|B)=P(B|A)P(A)/P(B)
59
New cards
Empirical Bayes
Combines individual and population data
60
New cards
Graph Nodes
Entities in network
61
New cards
Graph Edges
Connections between nodes
62
New cards
Clique
Fully connected group of nodes
63
New cards
Community
Tightly connected group
64
New cards
Louvain Algorithm
Detects communities by maximizing modularity
65
New cards
Modularity
Measures strength of network clustering
66
New cards
Neural Network Layers
Input, hidden, output layers
67
New cards
Deep Learning
Neural networks with many hidden layers
68
New cards
Descriptive Model
Describes data
69
New cards
Predictive Model
Predicts future outcomes
70
New cards
Prescriptive Model
Recommends decisions
71
New cards
Game Theory
Study of strategic decision-making
72
New cards
Equilibrium
No player benefits from changing strategy
73
New cards
Zero-Sum Game
One gain equals another’s loss
74
New cards
Non-Zero-Sum Game
Total payoff can vary
75
New cards
Survival Model
Predicts time until event occurs
76
New cards
Cox Proportional Hazards Model
h(t)=h0(t)e^(βx)
77
New cards
Hazard Rate
Risk of event occurring at time t
78
New cards
Censored Data
Incomplete information about event timing
79
New cards
Gradient Boosting
Sequentially builds models to correct errors
80
New cards
Cross Validation
Used to estimate model performance
81
New cards
Validation Set
Used to compare models
82
New cards
Test Set
Used to estimate final performance
83
New cards
Training Set
Used to fit model
84
New cards
Better Model Selection
Lower validation error + fewer variables preferred
85
New cards
SVM Margin
Wider margin → better generalization
86
New cards
Overfitting SVM
Narrow margin, complex boundary
87
New cards
k-Nearest Neighbors
Classifies based on nearby points
88
New cards
k-Means Clustering
Groups data into clusters based on distance
89
New cards
Principal Component Analysis
Reduces dimensionality via variance
90
New cards
Logistic Regression
Used for classification (binary outcomes)
91
New cards
Linear Regression
Used for predicting numeric response
92
New cards
Time Series Models
Use temporal data (ARIMA, Exponential Smoothing)
93
New cards
Attribute Data Models
Use feature-based data
94
New cards
Confusion Matrix
Shows classification performance
95
New cards
Threshold (Logistic)
Cutoff for classification decision
96
New cards
Lower Threshold
More positives, fewer false negatives
97
New cards
Higher Threshold
Fewer positives, more false negatives