CSCI323 Midterm Review – Key Concepts & Formulas

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/71

Earn XP

Description and Tags

Fill-in-the-blank flashcards covering major algorithms, formulas, and concepts from CSCI323 lecture notes on data preprocessing, decision trees, ensemble methods, SVM, KNN, clustering, evaluation metrics, regularization, reinforcement learning, MDPs, and PCA.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

72 Terms

New cards

Data preprocessing transforms data into a clean, suitable format for analysis.

raw

New cards

The first step of data preprocessing that handles missing values, duplicates, and outliers is called .

Data Cleaning

New cards

Combining data from multiple sources into one coherent dataset is known as Data .

Integration

New cards

Feature scaling method that rescales data to a fixed range [0,1] is called scaling.

Min-Max

New cards

StandardScaler in scikit-learn performs scaling, giving data mean 0 and variance 1.

Standard

New cards

Removing rows or columns with missing values is known as deletion.

listwise (or simply "deletion")

New cards

Filling missing values with mean, median, or mode is called .

Imputation

New cards

Low and low variance together produce a generalized model.

Bias

New cards

High bias can lead to , while high variance often leads to .

underfitting / overfitting

New cards

The Gini impurity formula is 1 − Σ( ² ).

New cards

Entropy for a split is calculated as −Σ( pi * log₂( ) ).

New cards

Information Gain equals Entropy(parent) minus the average entropy of children.

weighted

New cards

In a decision tree, the topmost node representing the entire dataset is the node.

root

New cards

Pruning a decision tree helps prevent .

overfitting

New cards

Bagging stands for Aggregating.

Bootstrap

New cards

Random Forest is an example of the ensemble method.

Bagging

New cards

Boosting converts multiple learners into a single strong learner.

weak

New cards

AdaBoost adjusts of samples after each iteration to focus on misclassified items.

weights

New cards

Gradient Boosting optimizes an arbitrary differentiable function.

loss

New cards

XGBoost improves gradient boosting mainly in terms of computational and scalability.

speed

New cards

In soft voting, a VotingClassifier averages the predicted class .

probabilities

New cards

Weak learners are prone to and have low predictive accuracy.

overfitting

New cards

Core concepts of Random Forest include bootstrap sampling, feature randomization, and a mechanism.

voting

New cards

SVM seeks the hyperplane that maximizes the margin between classes.

optimal

New cards

The regularization parameter in SVMs is denoted by .

New cards

The kernel trick allows SVMs to compute inner products in a high-dimensional space without explicit .

transformation

New cards

The RBF kernel is mathematically expressed as exp( −γ ∥xi − xj∥² ), where γ controls the kernel’s .

width (or influence)

New cards

KNN is considered a learning algorithm because it delays computation until prediction time.

lazy (instance-based)

New cards

A small K in KNN results in low bias but high .

variance

New cards

KNN requires or standardization because it is sensitive to feature magnitudes.

normalization

New cards

Naive Bayes assumes independence among features.

conditional (feature)

New cards

The posterior probability in Bayes’ theorem is denoted as (H|D).

New cards

Gaussian Naive Bayes is typically used for features.

continuous

New cards

K-Means minimizes the -Cluster Sum of Squares (WCSS).

Within

New cards

The K-Means++ variant improves the selection of initial .

centroids

New cards

DBSCAN identifies a point as a core point if its ε-neighborhood contains at least points.

minPts

New cards

Points not reachable from any core point in DBSCAN are labeled as .

noise (outliers)

New cards

DBSCAN does not require prior specification of the number of .

clusters

New cards

A confusion matrix compares a model’s predicted labels with the labels.

actual (true)

New cards

Precision measures the proportion of true out of all positive predictions.

positives

New cards

Recall is also known as .

sensitivity

New cards

The harmonic mean of precision and recall is the score.

New cards

L1 regularization adds the sum of absolute weights (λ Σ |wi|) and is popularly called .

Lasso

New cards

L2 regularization adds the sum of squared weights (λ Σ wi²) and is known as .

Ridge

New cards

Reinforcement Learning focuses on learning a policy to maximize a numerical signal.

reward

New cards

In RL, the strategy an agent follows to take actions is called the .

policy

New cards

The expected long-term reward from state s following π is the function Vπ(s).

value

New cards

Q-Learning updates Q(s,a) using the term R + γ * Q(s′,a′).

max

New cards

The exploration–exploitation trade-off is commonly managed with an -greedy strategy.

epsilon

New cards

SARSA is an -policy method that updates using the action actually taken.

New cards

Deep Q-Networks use replay to stabilize learning.

experience

New cards

In DDPG, the actor suggests actions while the evaluates them.

critic

New cards

An MDP is defined by states, actions, transition probabilities, rewards, and a factor γ.

discount

New cards

The Bellman optimality equation expresses V*(s) as the max over actions of R(s,a) + γ * Σ T * V*( ).

s′ (next state)

New cards

Policy iteration alternates between policy evaluation and policy .

improvement

New cards

Dimensionality reduction combats the of dimensionality.

curse

New cards

PCA projects data onto orthogonal axes called principal .

components

New cards

The first principal component captures the greatest in the data.

variance

New cards

To perform PCA, you first center the data by subtracting the .

mean

New cards

Eigenvectors corresponding to the largest eigenvalues are kept because they minimize error.

reconstruction

New cards

SVD factorizes X into W Σ Vᵀ, where V contains the principal .

components (eigenvectors)

New cards

Kernel PCA extends PCA to capture relationships using the kernel trick.

non-linear

New cards

Feature engineering technique that converts continuous variables into categorical bins is called .

discretization

New cards

Bootstrap sampling in Random Forests draws samples replacement.

with

New cards

VotingClassifier with voting='soft' averages predicted class before deciding.

probabilities

New cards

Gradient Boosting learns from the errors of previous models.

residual

New cards

In AdaBoost, training continues until the error falls below a threshold.

reweighted (or residual)

New cards

A weak learner in boosting often refers to a model slightly better than guessing.

random

New cards

The parameter γ in SVM’s RBF kernel: high γ → more complex model; low γ → model.

simpler

New cards

DBSCAN’s rule of thumb sets minPts ≥ D + 1, where D is the number of .

dimensions

New cards

The elbow method chooses the optimal K where the decrease in begins to slow down.

WCSS (inertia)

New cards

In reinforcement learning, choosing the best known action is , while trying new actions is .

exploitation / exploration