Machine Learning Exam One

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/50

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

51 Terms

1
New cards

Data Preprocessing

Prepare raw data clean normalize standardize encode features

2
New cards

Cleaning

Handle missing values remove outliers remove duplicates

3
New cards

Normalization

Scale features to range 0 to 1

4
New cards

Standardization

Center features at zero mean unit variance

5
New cards

Feature Engineering

Transform raw data into informative features

6
New cards

Dimensionality Reduction

Reduce complexity reduce overfitting speed training

7
New cards

PCA

Project data onto directions of maximum variance

8
New cards

Imputation

Fill missing values to make dataset usable

9
New cards

Mean Imputation

Replace missing values with mean

10
New cards

Median Imputation

Replace missing values with median

11
New cards

Mode Imputation

Replace missing values with mode

12
New cards

KNN Imputation

Use nearest neighbors to fill missing values

13
New cards

Regression Imputation

Use model to predict missing values

14
New cards

Supervised Learning

Learn mapping from inputs to labeled outputs

15
New cards

Classification

Predict class label

16
New cards

Regression

Predict continuous values

17
New cards

KNN

Predict based on closest points majority vote or average

18
New cards

KNN Sensitivity

Sensitive to number of neighbors and feature scaling

19
New cards

SVM

Maximize margin between features

20
New cards

SVM Pros

Effective with high dimensional data robust to overfitting

21
New cards

SVM Cons

Slow training requires careful parameter tuning

22
New cards

Naive Bayes

Assume features independent given class

23
New cards

Bayes Prior

Probability of class before observing features

24
New cards

Bayes Likelihood

Probability of feature given class

25
New cards

Bayes Posterior

Probability of class given features

26
New cards

Neural Network MLP

Input hidden output layers for classification

27
New cards

Activation Functions

Sigmoid ReLU Tanh

28
New cards

Backpropagation

Algorithm for training neural networks

29
New cards

Gradient Descent

Optimization method to minimize error

30
New cards

Ensemble Learning

Combine models to improve performance

31
New cards

Bagging

Train multiple models on different subsets reduce variance

32
New cards

Boosting

Sequentially train models focus on previous errors reduce bias

33
New cards

Random Forest

Collection of decision trees with bagging

34
New cards

AdaBoost

Boost weak learners sequentially to improve accuracy

35
New cards

Decision Tree

Split data to reduce uncertainty and predict classes

36
New cards

Information Gain

Reduction in entropy after splitting on attribute

37
New cards

Entropy

Measure of uncertainty or disorder in data

38
New cards

Parent Entropy

Entropy of dataset before split

39
New cards

Subset Entropy

Entropy of subset after split

40
New cards

Info Gain Calculation

Parent entropy minus weighted subset entropy

41
New cards

KNN Centroid

Method to assign class using mean of points in each class

42
New cards

Centroid

Mean position of points in class

43
New cards

Euclidean Distance

Distance between points in multidimensional space

44
New cards

KNN Prediction

Assign to class with nearest centroid or majority vote

45
New cards

Cross Validation

Split data to estimate model performance

46
New cards

Overfitting

Model fits training data too closely performs poorly on new data

47
New cards

Underfitting

Model too simple fails to capture patterns

48
New cards

Feature Scaling

Adjust range of features for algorithms sensitive to magnitude

49
New cards

Hyperplane

Decision boundary in SVM separating classes

50
New cards

Margin

Distance between hyperplane and closest data points

51
New cards

Use Case Selection

Choose algorithm based on data size feature type and goal