CSCI323 Midterm Review – Key Concepts & Formulas

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/71

flashcard set

Earn XP

Description and Tags

Fill-in-the-blank flashcards covering major algorithms, formulas, and concepts from CSCI323 lecture notes on data preprocessing, decision trees, ensemble methods, SVM, KNN, clustering, evaluation metrics, regularization, reinforcement learning, MDPs, and PCA.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

72 Terms

1
New cards

Data preprocessing transforms data into a clean, suitable format for analysis.

raw

2
New cards

The first step of data preprocessing that handles missing values, duplicates, and outliers is called .

Data Cleaning

3
New cards

Combining data from multiple sources into one coherent dataset is known as Data .

Integration

4
New cards

Feature scaling method that rescales data to a fixed range [0,1] is called scaling.

Min-Max

5
New cards

StandardScaler in scikit-learn performs scaling, giving data mean 0 and variance 1.

Standard

6
New cards

Removing rows or columns with missing values is known as deletion.

listwise (or simply "deletion")

7
New cards

Filling missing values with mean, median, or mode is called .

Imputation

8
New cards

Low and low variance together produce a generalized model.

Bias

9
New cards

High bias can lead to , while high variance often leads to .

underfitting / overfitting

10
New cards

The Gini impurity formula is 1 − Σ( ² ).

pi

11
New cards

Entropy for a split is calculated as −Σ( pi * log₂( ) ).

pi

12
New cards

Information Gain equals Entropy(parent) minus the average entropy of children.

weighted

13
New cards

In a decision tree, the topmost node representing the entire dataset is the node.

root

14
New cards

Pruning a decision tree helps prevent .

overfitting

15
New cards

Bagging stands for Aggregating.

Bootstrap

16
New cards

Random Forest is an example of the ensemble method.

Bagging

17
New cards

Boosting converts multiple learners into a single strong learner.

weak

18
New cards

AdaBoost adjusts of samples after each iteration to focus on misclassified items.

weights

19
New cards

Gradient Boosting optimizes an arbitrary differentiable function.

loss

20
New cards

XGBoost improves gradient boosting mainly in terms of computational and scalability.

speed

21
New cards

In soft voting, a VotingClassifier averages the predicted class .

probabilities

22
New cards

Weak learners are prone to and have low predictive accuracy.

overfitting

23
New cards

Core concepts of Random Forest include bootstrap sampling, feature randomization, and a mechanism.

voting

24
New cards

SVM seeks the hyperplane that maximizes the margin between classes.

optimal

25
New cards

The regularization parameter in SVMs is denoted by .

C

26
New cards

The kernel trick allows SVMs to compute inner products in a high-dimensional space without explicit .

transformation

27
New cards

The RBF kernel is mathematically expressed as exp( −γ ∥xi − xj∥² ), where γ controls the kernel’s .

width (or influence)

28
New cards

KNN is considered a learning algorithm because it delays computation until prediction time.

lazy (instance-based)

29
New cards

A small K in KNN results in low bias but high .

variance

30
New cards

KNN requires or standardization because it is sensitive to feature magnitudes.

normalization

31
New cards

Naive Bayes assumes independence among features.

conditional (feature)

32
New cards

The posterior probability in Bayes’ theorem is denoted as (H|D).

P

33
New cards

Gaussian Naive Bayes is typically used for features.

continuous

34
New cards

K-Means minimizes the -Cluster Sum of Squares (WCSS).

Within

35
New cards

The K-Means++ variant improves the selection of initial .

centroids

36
New cards

DBSCAN identifies a point as a core point if its ε-neighborhood contains at least points.

minPts

37
New cards

Points not reachable from any core point in DBSCAN are labeled as .

noise (outliers)

38
New cards

DBSCAN does not require prior specification of the number of .

clusters

39
New cards

A confusion matrix compares a model’s predicted labels with the labels.

actual (true)

40
New cards

Precision measures the proportion of true out of all positive predictions.

positives

41
New cards

Recall is also known as .

sensitivity

42
New cards

The harmonic mean of precision and recall is the score.

F1

43
New cards

L1 regularization adds the sum of absolute weights (λ Σ |wi|) and is popularly called .

Lasso

44
New cards

L2 regularization adds the sum of squared weights (λ Σ wi²) and is known as .

Ridge

45
New cards

Reinforcement Learning focuses on learning a policy to maximize a numerical signal.

reward

46
New cards

In RL, the strategy an agent follows to take actions is called the .

policy

47
New cards

The expected long-term reward from state s following π is the function Vπ(s).

value

48
New cards

Q-Learning updates Q(s,a) using the term R + γ * Q(s′,a′).

max

49
New cards

The exploration–exploitation trade-off is commonly managed with an -greedy strategy.

epsilon

50
New cards

SARSA is an -policy method that updates using the action actually taken.

on

51
New cards

Deep Q-Networks use replay to stabilize learning.

experience

52
New cards

In DDPG, the actor suggests actions while the evaluates them.

critic

53
New cards

An MDP is defined by states, actions, transition probabilities, rewards, and a factor γ.

discount

54
New cards

The Bellman optimality equation expresses V*(s) as the max over actions of R(s,a) + γ * Σ T * V*( ).

s′ (next state)

55
New cards

Policy iteration alternates between policy evaluation and policy .

improvement

56
New cards

Dimensionality reduction combats the of dimensionality.

curse

57
New cards

PCA projects data onto orthogonal axes called principal .

components

58
New cards

The first principal component captures the greatest in the data.

variance

59
New cards

To perform PCA, you first center the data by subtracting the .

mean

60
New cards

Eigenvectors corresponding to the largest eigenvalues are kept because they minimize error.

reconstruction

61
New cards

SVD factorizes X into W Σ Vᵀ, where V contains the principal .

components (eigenvectors)

62
New cards

Kernel PCA extends PCA to capture relationships using the kernel trick.

non-linear

63
New cards

Feature engineering technique that converts continuous variables into categorical bins is called .

discretization

64
New cards

Bootstrap sampling in Random Forests draws samples replacement.

with

65
New cards

VotingClassifier with voting='soft' averages predicted class before deciding.

probabilities

66
New cards

Gradient Boosting learns from the errors of previous models.

residual

67
New cards

In AdaBoost, training continues until the error falls below a threshold.

reweighted (or residual)

68
New cards

A weak learner in boosting often refers to a model slightly better than guessing.

random

69
New cards

The parameter γ in SVM’s RBF kernel: high γ → more complex model; low γ → model.

simpler

70
New cards

DBSCAN’s rule of thumb sets minPts ≥ D + 1, where D is the number of .

dimensions

71
New cards

The elbow method chooses the optimal K where the decrease in begins to slow down.

WCSS (inertia)

72
New cards

In reinforcement learning, choosing the best known action is , while trying new actions is .

exploitation / exploration