Machine Learning Essentials (Lecture Notes)

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/24

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards covering ML applications, learning types, data splitting, cross-validation, and Scikit-Learn pipelines.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

25 Terms

1
New cards

Applications of Machine Learning

Common applications include healthcare (disease prediction), finance (fraud detection), e-commerce (recommendations), autonomous vehicles, NLP (chatbots, translation), and computer vision (facial recognition).

2
New cards

Supervised Learning

Training with labeled data to predict outputs. Examples: Linear Regression, SVM, Decision Trees.

3
New cards

Unsupervised Learning

Training with unlabeled data to find hidden patterns. Examples: K-Means, PCA, Clustering.

4
New cards

Reinforcement Learning

Learning through trial and error with rewards/punishments. Examples: Q-Learning, AlphaGo.

5
New cards

Batch Learning

Learns from the entire dataset at once; retraining needed for updates.

6
New cards

Online Learning

Learns incrementally from data streams; adapts continuously.

7
New cards

Overfitting

Model fits training data too well but fails on new data.

8
New cards

Regularization

Technique to prevent overfitting by penalizing model complexity.

9
New cards

Underfitting

Model is too simple, fails to capture patterns.

10
New cards

Training Set

Data used to train the model.

11
New cards

Testing Set

Data used to evaluate model performance on unseen data.

12
New cards

Dataset Split (70–80% / 20–30%)

Typical split: 70–80% training data and 20–30% testing data.

13
New cards

K-Fold Cross Validation

Data is split into k folds; train on k−1 folds, test on the remaining fold; repeat k times.

14
New cards

Stratified Sampling

Ensures class proportions are preserved in train/test splits (important for imbalanced datasets).

15
New cards

Scikit-Learn Design

Main features: Consistent API, Estimators (fit, predict, transform), Transformers, Pipelines, Cross-validation tools, Metrics.

16
New cards

Estimator (in Scikit-Learn)

An object with methods like fit, predict (and transform) used to fit models.

17
New cards

Transformer

An object that transforms data (used inside Pipelines).

18
New cards

Pipeline (Scikit-Learn)

A sequence of preprocessing + model steps applied consistently.

19
New cards

Why Use Pipelines

Prevents data leakage, simplifies workflows, and ensures transformations apply to both training and testing.

20
New cards

Pipeline Example

StandardScaler → Logistic Regression model (a typical pipeline).

21
New cards

Generalization error

The error rate on new cases

22
New cards

Overfitting the training data

If the training error is low but the generalization error is high, it means that your model is…

23
New cards

Fit

The ______method is used to build models

24
New cards

Association Rule Learning

Discover patterns and relationships between attributes in large datasets

25
New cards

Semisupervised Learning

Few labeled instances and plenty of unlabeled instances, Combinations of unsupervised and supervised algorithms