ISYE 6501 - Intro to Data Analytics

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/96

Earn XP

Description and Tags

Midterm 2

Last updated 9:36 PM on 4/9/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

97 Terms

New cards

Overfitting

Model fits noise; low training error, high validation error

New cards

Underfitting

Model too simple; high training and validation error

New cards

Bias

Systematic error from overly simple model

New cards

Variance

Model sensitivity to data; high variance = overfitting

New cards

Forward Selection

Start with no variables, add significant ones (p ≤ threshold)

New cards

Backward Elimination

Start with all variables, remove insignificant ones (p > threshold)

New cards

Stepwise Regression

Combination of forward and backward selection

New cards

Lasso Regression

Sets some coefficients exactly to 0 (feature selection)

New cards

Ridge Regression

Shrinks coefficients to reduce variance (no variable removal)

New cards

Elastic Net

Combination of Lasso and Ridge penalties

New cards

Regularization Effect

Decreasing regularization → lower bias, higher variance

New cards

A/B Testing

Compare two alternatives using binary outcomes

New cards

Balanced Design

Each option tested equally often

New cards

Factorial Design

Test combinations of multiple factors

New cards

Fractional Factorial Design

Subset of combinations; faster but may confound effects

New cards

Exploration

Testing options to gain information

New cards

Exploitation

Using best-known option for immediate reward

New cards

Multi-Armed Bandit

Balances exploration and exploitation

New cards

Blocking

Control unwanted variation by grouping similar units

New cards

Bernoulli Distribution

Single trial with success probability p

New cards

Binomial Distribution

Number of successes in n independent trials

New cards

Geometric Distribution

Number of failures before first success

New cards

Poisson Distribution

Number of events in a fixed interval

New cards

Exponential Distribution

Time between events

New cards

Weibull Distribution

Time to failure (flexible hazard rate)

New cards

Poisson vs Exponential

Poisson = counts; Exponential = time between events

New cards

Memoryless Property

Future independent of past (Exponential, Geometric only)

New cards

Q-Q Plot

Straight line = data matches distribution; curve = mismatch

New cards

Deterministic Simulation

No randomness; same input → same output

New cards

Stochastic Simulation

Includes randomness

New cards

Discrete-Event Simulation

Changes occur at discrete times

New cards

Simulation Replications

Run multiple times to capture variability

New cards

Simulation Quality

Depends on accuracy of input data

New cards

Markov Chain

Future depends only on current state (memoryless)

New cards

Transition Probability (Pij)

Probability of moving from state i to j

New cards

Missing Data First Step

Check for patterns in missingness

New cards

Missing Data Options

Remove, indicator variable, or impute

New cards

Mean Imputation

Replace missing with mean; introduces bias

New cards

Regression Imputation

Predict missing values; risk of overfitting

New cards

Perturbation

Add noise to imputed values to restore variability

New cards

Imputation Risk

Using same data twice can cause overfitting

New cards

Optimization Variables

Decisions to be made

New cards

Optimization Constraints

Restrictions on variables

New cards

Objective Function

Measure to maximize or minimize

New cards

Feasible Solution

Satisfies all constraints

New cards

Optimal Solution

Best feasible solution

New cards

Convex Optimization

Guarantees global optimum

New cards

Non-Convex Optimization

May have multiple local optima

New cards

Integer Programming

Variables restricted to integers; harder to solve

New cards

Optimization Difficulty Order

Linear < Convex < Integer < Non-convex

New cards

Heuristic Algorithm

Fast, good solution but not guaranteed optimal

New cards

Nonparametric Tests

Do not assume distribution; use ranks

New cards

McNemar Test

Paired categorical (Yes/No)

New cards

Wilcoxon Signed-Rank Test

Paired numeric data

New cards

Mann-Whitney Test

Independent numeric samples

New cards

Parametric vs Nonparametric

Parametric uses means; nonparametric uses ranks

New cards

Bayes Theorem

P(A|B)=P(B|A)P(A)/P(B)

New cards

Empirical Bayes

Combines individual and population data

New cards

Graph Nodes

Entities in network

New cards

Graph Edges

Connections between nodes

New cards

Clique

Fully connected group of nodes

New cards

Community

Tightly connected group

New cards

Louvain Algorithm

Detects communities by maximizing modularity

New cards

Modularity

Measures strength of network clustering

New cards

Neural Network Layers

Input, hidden, output layers

New cards

Deep Learning

Neural networks with many hidden layers

New cards

Descriptive Model

Describes data

New cards

Predictive Model

Predicts future outcomes

New cards

Prescriptive Model

Recommends decisions

New cards

Game Theory

Study of strategic decision-making

New cards

Equilibrium

No player benefits from changing strategy

New cards

Zero-Sum Game

One gain equals another’s loss

New cards

Non-Zero-Sum Game

Total payoff can vary

New cards

Survival Model

Predicts time until event occurs

New cards

Cox Proportional Hazards Model

h(t)=h0(t)e^(βx)

New cards

Hazard Rate

Risk of event occurring at time t

New cards

Censored Data

Incomplete information about event timing

New cards

Gradient Boosting

Sequentially builds models to correct errors

New cards

Cross Validation

Used to estimate model performance

New cards

Validation Set

Used to compare models

New cards

Test Set

Used to estimate final performance

New cards

Training Set

Used to fit model

New cards

Better Model Selection

Lower validation error + fewer variables preferred

New cards

SVM Margin

Wider margin → better generalization

New cards

Overfitting SVM

Narrow margin, complex boundary

New cards

k-Nearest Neighbors

Classifies based on nearby points

New cards

k-Means Clustering

Groups data into clusters based on distance

New cards

Principal Component Analysis

Reduces dimensionality via variance

New cards

Logistic Regression

Used for classification (binary outcomes)

New cards

Linear Regression

Used for predicting numeric response

New cards

Time Series Models

Use temporal data (ARIMA, Exponential Smoothing)

New cards

Attribute Data Models

Use feature-based data

New cards

Confusion Matrix

Shows classification performance

New cards

Threshold (Logistic)

Cutoff for classification decision

New cards

Lower Threshold

More positives, fewer false negatives

New cards

Higher Threshold

Fewer positives, more false negatives