SQL, Scikit, and Supervised Learning (ML)

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/41

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

42 Terms

1
New cards

ORDER BY

A SQL clause that allows you to sort query results, defaulting to ascending order (ASC).

2
New cards

ASC

Ascending order for sorting results in SQL.

3
New cards

DESC

Descending order for sorting results in SQL.

4
New cards

WHERE

SQL clause that filters individual rows before grouping.

5
New cards

HAVING

SQL clause that filters results after grouping, often used with aggregate functions.

6
New cards

GROUP BY

SQL clause used to group rows by unique values in one or more columns.

7
New cards

Logical execution order of SQL

The order in which SQL processes the clauses: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT.

8
New cards

Aggregate functions

Functions like AVG() or COUNT() used to summarize data.

9
New cards

Machine Learning (ML)

A field of AI focused on teaching computers to recognize patterns and make predictions using data.

10
New cards

Artificial Intelligence (AI)

A broader field that simulates human behavior and tasks.

11
New cards

Data Science

The practice of analyzing data to extract insights, utilizing tools such as Machine Learning.

12
New cards

Supervised Learning

A type of Machine Learning that uses labeled data to train models.

13
New cards

Feature Matrix (X)

Input data consisting of all columns of features for every row.

14
New cards

Feature Vector

A single row from the feature matrix representing an individual data point.

15
New cards

Target Vector (y)

The output of supervised learning that the model aims to predict.

16
New cards

Classification

A supervised learning task that involves predicting discrete labels.

17
New cards

Regression

A supervised learning task that involves predicting continuous numeric values.

18
New cards

Quantitative data

Data that is numeric, can be discrete or continuous.

19
New cards

Qualitative data

Categorical data that can be ordinal or nominal.

20
New cards

Discrete data

Countable quantitative data, e.g., number of pets.

21
New cards

Continuous data

Measurable quantitative data that can take any value in a range.

22
New cards

Ordinal data

Qualitative data with a natural order, e.g., small/medium/large.

23
New cards

Nominal data

Qualitative data without a natural order, e.g., eye color.

24
New cards

One-Hot Encoding

A technique to convert categorical data into vectors.

25
New cards

Training set

The dataset used for training a model.

26
New cards

Validation set

A subset of data used to tune model parameters.

27
New cards

Test set

The dataset used to evaluate the final performance of the model.

28
New cards

L1 Loss (MAE)

A loss function measuring error as the sum of absolute errors.

29
New cards

L2 Loss (MSE)

A loss function measuring error as the sum of squared errors.

30
New cards

Binary Cross-Entropy

A loss function used for binary classification tasks.

31
New cards

Accuracy

The percentage of correct predictions made by a model.

32
New cards

Imbalanced datasets

Datasets where one class is significantly more frequent than others, complicating classification.

33
New cards

Unsupervised Learning

A type of Machine Learning that finds patterns without labeled data.

34
New cards

Reinforcement Learning

A type of Machine Learning where an agent learns by interacting with an environment.

35
New cards

traintestsplit

A function in Scikit-learn used to divide data into training and test sets.

36
New cards

NaNs

Missing values in data that need to be cleaned before model training.

37
New cards

Overfitting

A modeling error that occurs when a model fits too closely to the training data.

38
New cards

Critical Edge Cases

Special scenarios in ML that can lead to errors or suboptimal performance.

39
New cards

Common Mistakes in SQL

Errors such as misspelling 'traintestsplit' as 'trantestsplit' or misusing WHERE with aggregates.

40
New cards

Feature Vector real-world example

Predicting whether a patient has a disease based on various features.

41
New cards

Regression real-world example

Predicting house prices based on features such as size, location, etc.

42
New cards

Metrics to use instead of accuracy

Precision, recall, or F1 score for evaluating model performance in imbalanced datasets.