1/71
Fill-in-the-blank flashcards covering major algorithms, formulas, and concepts from CSCI323 lecture notes on data preprocessing, decision trees, ensemble methods, SVM, KNN, clustering, evaluation metrics, regularization, reinforcement learning, MDPs, and PCA.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data preprocessing transforms data into a clean, suitable format for analysis.
raw
The first step of data preprocessing that handles missing values, duplicates, and outliers is called .
Data Cleaning
Combining data from multiple sources into one coherent dataset is known as Data .
Integration
Feature scaling method that rescales data to a fixed range [0,1] is called scaling.
Min-Max
StandardScaler in scikit-learn performs scaling, giving data mean 0 and variance 1.
Standard
Removing rows or columns with missing values is known as deletion.
listwise (or simply "deletion")
Filling missing values with mean, median, or mode is called .
Imputation
Low and low variance together produce a generalized model.
Bias
High bias can lead to , while high variance often leads to .
underfitting / overfitting
The Gini impurity formula is 1 − Σ( ² ).
pi
Entropy for a split is calculated as −Σ( pi * log₂( ) ).
pi
Information Gain equals Entropy(parent) minus the average entropy of children.
weighted
In a decision tree, the topmost node representing the entire dataset is the node.
root
Pruning a decision tree helps prevent .
overfitting
Bagging stands for Aggregating.
Bootstrap
Random Forest is an example of the ensemble method.
Bagging
Boosting converts multiple learners into a single strong learner.
weak
AdaBoost adjusts of samples after each iteration to focus on misclassified items.
weights
Gradient Boosting optimizes an arbitrary differentiable function.
loss
XGBoost improves gradient boosting mainly in terms of computational and scalability.
speed
In soft voting, a VotingClassifier averages the predicted class .
probabilities
Weak learners are prone to and have low predictive accuracy.
overfitting
Core concepts of Random Forest include bootstrap sampling, feature randomization, and a mechanism.
voting
SVM seeks the hyperplane that maximizes the margin between classes.
optimal
The regularization parameter in SVMs is denoted by .
C
The kernel trick allows SVMs to compute inner products in a high-dimensional space without explicit .
transformation
The RBF kernel is mathematically expressed as exp( −γ ∥xi − xj∥² ), where γ controls the kernel’s .
width (or influence)
KNN is considered a learning algorithm because it delays computation until prediction time.
lazy (instance-based)
A small K in KNN results in low bias but high .
variance
KNN requires or standardization because it is sensitive to feature magnitudes.
normalization
Naive Bayes assumes independence among features.
conditional (feature)
The posterior probability in Bayes’ theorem is denoted as (H|D).
P
Gaussian Naive Bayes is typically used for features.
continuous
K-Means minimizes the -Cluster Sum of Squares (WCSS).
Within
The K-Means++ variant improves the selection of initial .
centroids
DBSCAN identifies a point as a core point if its ε-neighborhood contains at least points.
minPts
Points not reachable from any core point in DBSCAN are labeled as .
noise (outliers)
DBSCAN does not require prior specification of the number of .
clusters
A confusion matrix compares a model’s predicted labels with the labels.
actual (true)
Precision measures the proportion of true out of all positive predictions.
positives
Recall is also known as .
sensitivity
The harmonic mean of precision and recall is the score.
F1
L1 regularization adds the sum of absolute weights (λ Σ |wi|) and is popularly called .
Lasso
L2 regularization adds the sum of squared weights (λ Σ wi²) and is known as .
Ridge
Reinforcement Learning focuses on learning a policy to maximize a numerical signal.
reward
In RL, the strategy an agent follows to take actions is called the .
policy
The expected long-term reward from state s following π is the function Vπ(s).
value
Q-Learning updates Q(s,a) using the term R + γ * Q(s′,a′).
max
The exploration–exploitation trade-off is commonly managed with an -greedy strategy.
epsilon
SARSA is an -policy method that updates using the action actually taken.
on
Deep Q-Networks use replay to stabilize learning.
experience
In DDPG, the actor suggests actions while the evaluates them.
critic
An MDP is defined by states, actions, transition probabilities, rewards, and a factor γ.
discount
The Bellman optimality equation expresses V*(s) as the max over actions of R(s,a) + γ * Σ T * V*( ).
s′ (next state)
Policy iteration alternates between policy evaluation and policy .
improvement
Dimensionality reduction combats the of dimensionality.
curse
PCA projects data onto orthogonal axes called principal .
components
The first principal component captures the greatest in the data.
variance
To perform PCA, you first center the data by subtracting the .
mean
Eigenvectors corresponding to the largest eigenvalues are kept because they minimize error.
reconstruction
SVD factorizes X into W Σ Vᵀ, where V contains the principal .
components (eigenvectors)
Kernel PCA extends PCA to capture relationships using the kernel trick.
non-linear
Feature engineering technique that converts continuous variables into categorical bins is called .
discretization
Bootstrap sampling in Random Forests draws samples replacement.
with
VotingClassifier with voting='soft' averages predicted class before deciding.
probabilities
Gradient Boosting learns from the errors of previous models.
residual
In AdaBoost, training continues until the error falls below a threshold.
reweighted (or residual)
A weak learner in boosting often refers to a model slightly better than guessing.
random
The parameter γ in SVM’s RBF kernel: high γ → more complex model; low γ → model.
simpler
DBSCAN’s rule of thumb sets minPts ≥ D + 1, where D is the number of .
dimensions
The elbow method chooses the optimal K where the decrease in begins to slow down.
WCSS (inertia)
In reinforcement learning, choosing the best known action is , while trying new actions is .
exploitation / exploration