Data Mining and Machine Learning Concepts

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/32

flashcard set

Earn XP

Description and Tags

These flashcards cover key concepts and terminology from the Data Mining and Machine Learning lecture notes.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

33 Terms

1
New cards

The first phase of the Data Mining Process is __.

Business Understanding

2
New cards

In the Data Mining Process, __ is the phase where data is collected and described.

Data Understanding

3
New cards

The process of cleaning and transforming data into the required format is known as __.

Data Preparation

4
New cards

The variable to predict in data mining is referred to as a __.

Label

5
New cards

Input variables used to predict outcomes are called __.

Features

6
New cards

An example of a __ feature is age or income.

Numerical

7
New cards

Features that contain non-numerical values and are grouped into categories are known as __ features.

Categorical

8
New cards

In __ learning, models learn from labeled data to make predictions.

Supervised

9
New cards

The type of learning where the model learns from unlabeled data to identify patterns is called __ learning.

Unsupervised

10
New cards

The function used in supervised learning to quantify prediction accuracy is called __ function.

Loss

11
New cards

The process of adjusting model parameters to minimize the loss function is termed __.

Optimization

12
New cards

The method for handling outliers can include __ or keeping them based on context.

Removing

13
New cards

One-hot encoding is used to transform __ features into numerical values.

Categorical

14
New cards

Normalization is important for features with __ ranges to ensure balanced comparisons.

Large

15
New cards

The purpose of the __ method is to minimize redundancy by removing irrelevant features.

Feature Selection

16
New cards

In imbalanced datasets, __ involves randomly removing samples from the majority class.

Downsampling

17
New cards

The __ is a table that describes the performance of a classification model.

Confusion Matrix

18
New cards

The ratio of true positive predictions to all predicted positives is called __.

Precision

19
New cards

The probability that measures the chance of a positive instance being predicted correctly is called __.

Recall

20
New cards

A __ curve plots the True Positive Rate against the False Positive Rate.

ROC (Receiver Operating Characteristic)

21
New cards

The harmonic mean of precision and recall is known as the __ score.

F1

22
New cards

Decision trees are used for both classification and __ tasks.

Regression

23
New cards

Binary classification involves distinguishing between __ classes.

Two

24
New cards

The ultimate goal of supervised learning is to make __ that generalize well to unseen data.

Predictions

25
New cards

The algorithms in supervised learning use __ data tagged with correct outputs.

Labeled

26
New cards

A technique that adds a penalty term to the loss function to prevent overfitting is called __.

Regularization

27
New cards

In logistic regression, the mapping function used to predict probabilities is termed __.

Sigmoid Function

28
New cards

The learning mechanism of a neural network involves weights assigned to connections which determine __.

Influence

29
New cards

Deep Neural Networks consist of multiple layers, including __, hidden, and output layers.

Input

30
New cards

A __ model uses probabilities to predict class membership based on input features.

Naïve Bayes

31
New cards

Conditional independence states that events A and B are independent given __.

C

32
New cards

AUC stands for __, which measures the performance of a classification model.

Area Under the ROC Curve

33
New cards

Features with independent variables in Naïve Bayes are assumed to be __.

Conditionally independent.