1/32
These flashcards cover key concepts and terminology from the Data Mining and Machine Learning lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
The first phase of the Data Mining Process is __.
Business Understanding
In the Data Mining Process, __ is the phase where data is collected and described.
Data Understanding
The process of cleaning and transforming data into the required format is known as __.
Data Preparation
The variable to predict in data mining is referred to as a __.
Label
Input variables used to predict outcomes are called __.
Features
An example of a __ feature is age or income.
Numerical
Features that contain non-numerical values and are grouped into categories are known as __ features.
Categorical
In __ learning, models learn from labeled data to make predictions.
Supervised
The type of learning where the model learns from unlabeled data to identify patterns is called __ learning.
Unsupervised
The function used in supervised learning to quantify prediction accuracy is called __ function.
Loss
The process of adjusting model parameters to minimize the loss function is termed __.
Optimization
The method for handling outliers can include __ or keeping them based on context.
Removing
One-hot encoding is used to transform __ features into numerical values.
Categorical
Normalization is important for features with __ ranges to ensure balanced comparisons.
Large
The purpose of the __ method is to minimize redundancy by removing irrelevant features.
Feature Selection
In imbalanced datasets, __ involves randomly removing samples from the majority class.
Downsampling
The __ is a table that describes the performance of a classification model.
Confusion Matrix
The ratio of true positive predictions to all predicted positives is called __.
Precision
The probability that measures the chance of a positive instance being predicted correctly is called __.
Recall
A __ curve plots the True Positive Rate against the False Positive Rate.
ROC (Receiver Operating Characteristic)
The harmonic mean of precision and recall is known as the __ score.
F1
Decision trees are used for both classification and __ tasks.
Regression
Binary classification involves distinguishing between __ classes.
Two
The ultimate goal of supervised learning is to make __ that generalize well to unseen data.
Predictions
The algorithms in supervised learning use __ data tagged with correct outputs.
Labeled
A technique that adds a penalty term to the loss function to prevent overfitting is called __.
Regularization
In logistic regression, the mapping function used to predict probabilities is termed __.
Sigmoid Function
The learning mechanism of a neural network involves weights assigned to connections which determine __.
Influence
Deep Neural Networks consist of multiple layers, including __, hidden, and output layers.
Input
A __ model uses probabilities to predict class membership based on input features.
Naïve Bayes
Conditional independence states that events A and B are independent given __.
C
AUC stands for __, which measures the performance of a classification model.
Area Under the ROC Curve
Features with independent variables in Naïve Bayes are assumed to be __.
Conditionally independent.