Machine Learning Exam One

0.0(0)

Studied by 0 people

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/50

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

51 Terms

New cards

Data Preprocessing

Prepare raw data clean normalize standardize encode features

New cards

Cleaning

Handle missing values remove outliers remove duplicates

New cards

Normalization

Scale features to range 0 to 1

New cards

Standardization

Center features at zero mean unit variance

New cards

Feature Engineering

Transform raw data into informative features

New cards

Dimensionality Reduction

Reduce complexity reduce overfitting speed training

New cards

PCA

Project data onto directions of maximum variance

New cards

Imputation

Fill missing values to make dataset usable

New cards

Mean Imputation

Replace missing values with mean

New cards

Median Imputation

Replace missing values with median

New cards

Mode Imputation

Replace missing values with mode

New cards

KNN Imputation

Use nearest neighbors to fill missing values

New cards

Regression Imputation

Use model to predict missing values

New cards

Supervised Learning

Learn mapping from inputs to labeled outputs

New cards

Classification

Predict class label

New cards

Regression

Predict continuous values

New cards

KNN

Predict based on closest points majority vote or average

New cards

KNN Sensitivity

Sensitive to number of neighbors and feature scaling

New cards

SVM

Maximize margin between features

New cards

SVM Pros

Effective with high dimensional data robust to overfitting

New cards

SVM Cons

Slow training requires careful parameter tuning

New cards

Naive Bayes

Assume features independent given class

New cards

Bayes Prior

Probability of class before observing features

New cards

Bayes Likelihood

Probability of feature given class

New cards

Bayes Posterior

Probability of class given features

New cards

Neural Network MLP

Input hidden output layers for classification

New cards

Activation Functions

Sigmoid ReLU Tanh

New cards

Backpropagation

Algorithm for training neural networks

New cards

Gradient Descent

Optimization method to minimize error

New cards

Ensemble Learning

Combine models to improve performance

New cards

Bagging

Train multiple models on different subsets reduce variance

New cards

Boosting

Sequentially train models focus on previous errors reduce bias

New cards

Random Forest

Collection of decision trees with bagging

New cards

AdaBoost

Boost weak learners sequentially to improve accuracy

New cards

Decision Tree

Split data to reduce uncertainty and predict classes

New cards

Information Gain

Reduction in entropy after splitting on attribute

New cards

Entropy

Measure of uncertainty or disorder in data

New cards

Parent Entropy

Entropy of dataset before split

New cards

Subset Entropy

Entropy of subset after split

New cards

Info Gain Calculation

Parent entropy minus weighted subset entropy

New cards

KNN Centroid

Method to assign class using mean of points in each class

New cards

Centroid

Mean position of points in class

New cards

Euclidean Distance

Distance between points in multidimensional space

New cards

KNN Prediction

Assign to class with nearest centroid or majority vote

New cards

Cross Validation

Split data to estimate model performance

New cards

Overfitting

Model fits training data too closely performs poorly on new data

New cards

Underfitting

Model too simple fails to capture patterns

New cards

Feature Scaling

Adjust range of features for algorithms sensitive to magnitude

New cards

Hyperplane

Decision boundary in SVM separating classes

New cards

Margin

Distance between hyperplane and closest data points

New cards

Use Case Selection

Choose algorithm based on data size feature type and goal