MIDTERM Machine Learning

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/86

flashcard set

Earn XP

Description and Tags

i hate this so much i hate this

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

87 Terms

1
New cards

Volume

The huge quantity of data being collected and stored.

2
New cards

Variety

The data comes in many different types and from many different sources.

3
New cards

Velocity

The incredible speed at which data is generated, often in real-time.

4
New cards

Veracity

How accurate and truthful the data is. Low-quality data leads to bad analysis.

5
New cards

Variability

The data is constantly changing, which can make it hard to manage.

6
New cards

Value

This is the most important part—what useful information can we get from the data to make decisions.

7
New cards

Data

Raw numbers and text collected from measurements.

8
New cards

Information

What you get after you analyze the data. It's the meaning you extract to make decisions.

9
New cards

Descriptive Analytics

"What happened?" This involves looking at past and current data to understand business performance. For example, looking at last quarter's sales report.

10
New cards

Predictive Analytics

"What will happen?" This uses historical data to find patterns and predict the future. For example, forecasting next month's sales based on past trends.

11
New cards

Prescriptive Analytics

"What should I do?” This is the most advanced type. It not only predicts what will happen but also suggests the best actions to take to achieve a goal, like minimizing costs or maximizing profits.

12
New cards

Reliability

When data are accurate and consistent (low variability).

13
New cards

Validity

When data correctly measure what they are supposed to measure. This means the data are both correct and accurate.

14
New cards

Uncertainty

The imperfect knowledge of what will happen in the future. As the variety and velocity (speed) of data increase, uncertainty also increases.

15
New cards

Risk

The consequences of what happens.

16
New cards

Flexible/Complex Models

Models like deep neural networks or random forests can capture complex, highly non-linear patterns - f(X) - and minimize prediction error. However, they often act as black boxes because transparency is sacrificed. These are typically used for Prediction.

17
New cards

Less Flexible/Simple Models

Models like linear regression are much easier to interpret and communicate. They may not predict as well if f(X) is highly non-linear, but they are easy to interpret. These are typically used for Inference.

18
New cards

Mean Squared Error (MSE)

MSE quantifies the average squared difference between the true outcome values and the predicted values. A lower MSE indicates a better fit.

19
New cards

MSE Sensitivity

MSE is especially sensitive to large prediction errors, as the errors are squared, giving them a disproportionate impact on the metric.

20
New cards

MSE Interpretation

The square root of the MSE indicates the approximate average deviation of predictions from actual values.

21
New cards

Training Data

Contains the data the model will use to build its prediction function. Measuring performance on this set gives the Training Error.

22
New cards

Test Data

Contains the unseen data. Measuring performance (e.g., MSE) on this set gives the Test Error, which is the unbiased evaluation of how the model performs in the real world.

23
New cards

Bias

  • Measures how far off, on average, the predictions are from the true value.

  • High Bias means the model is too simple and misses key patterns, resulting in underfitting.

  • Bias is typically high in simple models (e.g., Linear Regression) and low in complex models.

24
New cards

Variance

  • Measures how much predictions change if the model were trained on different datasets.

  • High Variance means the model is too complex and overly sensitive to noise, resulting in overfitting.

  • Variance is typically high in complex models (e.g., deep decision trees) and low in simple models.

25
New cards

Bias-Variance Trade-Off

Increasing model flexibility (complexity) tends to reduce bias but concurrently increase variance. The best models achieve a balance—low total error—by being neither too simple nor too complex. The goal is not to eliminate all error, but to reduce bias and variance to get as close as possible to the floor set by the irreducible error (ε).

26
New cards

Underfitting

Model is too simple, misses the pattern, and performs poorly on all data.

27
New cards

Overfitting

Model is too flexible, memorizes noise, performs well on training data, but poorly on new data (poor generalization).

28
New cards

Unsupervised learning

What you use when you only have input data but no specific output (Y) to predict. Involves unlabeled data that the algorithm tries to make sense of by extracting features and patterns on its own. This is also known as a clustering problem.

29
New cards

Cluster

A group of similar observations.

30
New cards

Centroid

The “center” of a cluster (average of points in the cluster).

31
New cards

Distance

Usually Euclidean distance (straight-line distance between points).

32
New cards

PCA (Principal Component Analysis)

A technique that reduces data with many variables into fewer dimensions, making it easier to visualize while keeping as much variation as possible.

33
New cards

Supervised learning

The algorithm learns on a labeled dataset.

34
New cards

Input Variables (X)

Variables used to make predictions. These are also known as:

  • Predictors

  • Independent variables

  • Features

  • Covariates

35
New cards

Output Variable (Y)

The variable we are trying to predict or understand. Also referred to as:

  • Response

  • Dependent variable

  • Target variable

36
New cards

Prediction

Use inputs (X) to predict outputs (Y) for new data. Care less about function (black box), more about the accuracy of outputs.

Ex. “Given your symptoms (X), I predict you have the flu (Y).”

37
New cards

Inference

Understand how inputs (X) are related to outputs (Y). Function matters, because it tells us which variables matter and how they affect Y.

Ex. “Fever is the strongest factor in diagnosing flu, more than cough or headache.”

38
New cards

Regression

(Predicting a Number): This is when you want to predict a continuous number, like a price or a temperature.

Example: Linear Regression Model

39
New cards

Classification

(Predicting a Category): This is when you want to predict which group or category something belongs to, like "yes/no" or "up/down".

Example: Logistic Regression Model

40
New cards

Simple Linear Regression (SLR)

Simple Linear Regression is a foundational technique in supervised learning used to model the relationship between two numerical variables. SLR aims to represent this relationship using a straight line.

41
New cards

Response Variable (Y)

The dependent variable whose value we wish to predict.

42
New cards

Predictor Variable (X)

The independent variable used to predict the response.

43
New cards

Beta Coefficients

B_0 (intercept) and B_1 (slope)

44
New cards

Intercept

The predicted value of the response (Y) when the predictor (X) is equal to zero.

45
New cards

Slope

The amount by which the response (Y) is expected to change for every one-unit increase in the predictor (X). It measures the average change in Y for a one-unit increase in X

46
New cards

Irreducible error (ε)

Captures all the variation in the response variable (Y) that is not explained by the predictor (X). We assume that the average value of this error term is zero.

47
New cards

Least Squares Estimation

To select the slope and intercept that minimize the errors between the actual observed values and the values predicted by the line.

48
New cards

Residuals

These are the vertical distances between each observed data point and the fitted regression line, representing the error for that observation.

49
New cards

Residual Sum of Squares (RSS)

The goal of the least squares method is to minimize the RSS. This total squared error measures how far off the predictions are from the actual values.

50
New cards

Standard Errors (SE)

Shows how uncertain or variable the coefficient estimate is. Smaller standard errors mean more precise estimates.

51
New cards

Confidence Intervals

Gives a range of values where the true population coefficient is 95% likely to fall; if this range does not include zero, it indicates the predictor (X) has a real, statistically significant effect on the response (Y).

52
New cards

Null Hypothesis

The slope is zero (meaning X has no relationship with Y).

53
New cards

T-statistic

Tells you how many standard errors the coefficient is away from zero. Large absolute values (like 24 or 55) → strong evidence the effect is real.

54
New cards

P-Value

The probability of getting this result if the true effect were actually zero. Small p-values (usually < 0.05) → reject the null hypothesis → the variable has a statistically significant effect.

55
New cards

Residual Standard Error (RSE)

The average size of prediction errors — how far the actual data points fall from the regression line. Output is a summary of how the residuals are spread

56
New cards

R-Squared

The percentage of variation in Y explained by X.

Range: 0 → 1, 0 means the model explains none of the variation, 1 means the model explains all the variation perfectly.

Ex. 0.6059 means horsepower explains about 61% of the variation in mpg.

57
New cards

Adjusted R-Squared

How well the model explains the data after adjusting for the number of predictors. If Adjusted R² is close to R², your predictors are actually meaningful. If Adjusted R² is much lower, it means some predictors may not be helping.

58
New cards

F-statistic

Measures how well the regression model explains variation compared to a model with no predictors.
Bigger = better model fit.

59
New cards

SLR Assumptions

For regression results (like confidence intervals and p-values) to be trustworthy, these 4 things need to be mostly true.

60
New cards

Linearity

The relationship between X and Y should look like a straight line — not curved.

61
New cards

Normality of Errors

The leftover differences between actual and predicted values (called errors or residuals) should follow a normal, bell-shaped pattern.

62
New cards

Independence of Errors

Each data point should be separate — one observation’s error shouldn’t affect another’s.

63
New cards

Constant Variance (Homoskedasticity)

The spread of errors should be about the same everywhere along the line.
If not: it’s called heteroskedasticity, and it can make your model’s results less reliable.

64
New cards

Multiple Linear Regression (MLR)

Using two or more factors (X₁, X₂, …, Xp - called "predictors," “inputs,” or "explanatory variables") at once to predict an outcome (Y - the "response variable").

65
New cards

Choosing Predictors

You might not want to use every predictor available. Keeping your model simple (parsimonious) makes it easier to understand and can lead to better predictions on new data.

66
New cards

Forward Selection

Start with no predictors. Add them one by one, always picking the one that improves the model the most, until adding more doesn't help significantly.

67
New cards

Backward Elimination

Start with all predictors. Remove the least useful one (usually the one with the highest p-value) and repeat this process until all remaining predictors are significant.

68
New cards

Stepwise Selection

A mix of backward and forward selection. At each step, the model can add a new useful predictor or drop one that has become non-significant.

69
New cards

Multicollinearity

A major issue in multiple regression. Happens when two or more of your predictor variables are highly correlated with each other.

Ex. trying to predict a person's weight using both their height in inches and their height in centimeters.

70
New cards

Variance Inflation Factor (VIF)

A score that measures how much a predictor is correlated with the others. A common rule of thumb is that a VIF score greater than 5 or 10 indicates a problem.

71
New cards

Qualitative Predictors

(Also known as categorical variables) represent discrete groups or categories rather than continuous numerical quantities.

72
New cards

Nominal

Categories that have no intrinsic ordering or rank.

73
New cards

Ordinal

Categories that have a clear rank ordering to them (5 star ratings).

74
New cards

Dichotomous or Binary

Nominal variables with exactly two categories (Yes/No).

75
New cards

Ordinal Variable Issues

Standard dummy coding ignores their natural order (Loss of Information), while assigning arbitrary numbers (like 1–5) wrongly assumes equal spacing between categories (Incorrect Encoding Risk); both approaches risk losing information or producing misleading results.

76
New cards

Dummy Variables

A special numeric tool used in regression analysis to represent discrete groups or categories (qualitative predictors), ex 1 = yes and 0 = no.

77
New cards

Baseline Category

The reference level is the category left out when creating dummy variables; it becomes the baseline the model compares all other coefficients against.
Example: If a variable “Color” has categories Red, Blue, and Green, and Red is the reference level, then the coefficients for Blue and Green show how their effects differ from Red.

78
New cards

Dummy Variable Trap

This is the condition where including all $k$ dummy variables for a categorical predictor with $k$ levels causes perfect multicollinearity, making it impossible to fit the model; therefore, one category (the baseline) must always be omitted.

79
New cards

Omitted Variable Bias

Happens when you leave out an important variable that affects both your predictor and your outcome, causing the model’s results to be misleading.

Example: If you study how studying time affects grades but forget to include sleep, your results might be wrong because sleep affects both how much someone studies and how well they do.

80
New cards

Dummy Intercept

Represents the estimated average outcome (dependent variable) for the baseline category.

Example: If "January" is the baseline month, the Intercept is the expected units sold in January.

81
New cards

Dummy Coefficient

The difference in the expected outcome between the category associated with that dummy variable and the chosen baseline category.

82
New cards

Baseline Sensitivity

The interpretation of coefficients is entirely relative to the chosen baseline. If the baseline is changed (e.g., from Texas to Kentucky), the Intercept and all other group coefficients will change because they are recalculated based on the new reference point.

83
New cards

Insignificant intercept

Means there isn’t enough evidence to show that the average outcome is different from zero when all predictors are zero.

84
New cards

Additive Assumption

The standard default setting in linear regression, which assumes that the effect of one predictor on the response is independent of the value of the other predictors; for example, the increase in sales from spending on TV is assumed not to depend on the amount spent on radio.

85
New cards

Interaction Effect

Occurs when the effect of one predictor on the response variable depends on the value of another predictor, effectively removing the standard additive assumption from the model.

86
New cards

Types of Interactions

  1. Two dummy variables (e.g., gender × treatment)

  2. One dummy, one numeric (e.g., gender × age)

  3. Two numeric variables (e.g., age × horsepower)

87
New cards

Breusch–Pagan (BP) test

Checks whether the spread of a regression’s errors (residuals) is constant — an assumption called homoskedasticity. If the test’s p-value is small (usually < 0.05), it means the errors’ spread changes with the predictor — a problem called heteroskedasticity.