1/10
These flashcards cover key concepts related to machine learning imputation techniques and data pre-processing tasks.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Data Cleaning
The process of addressing or fixing missing values, duplicate data, and incorrectly formatted data.
Data Integration
The process of combining data from different sources into a unified view.
Data Reduction
The process of reducing the dimensionality of the dataset, simplifying the data.
Data Transformation
The process of converting features into a format suitable for specific models or algorithms.
Imputation
A group of techniques used to replace missing values in a dataset with a reasonable estimate.
Univariate Imputation
Replaces missing values for a feature using only non-missing values for that same feature.
Multivariate Imputation
Provides more accurate imputations compared to univariate methods, especially when complex dependencies between features exist.
K-nearest Neighbors Imputation (KNN)
Uses the K most similar instances to a data point to impute the missing values, can handle numeric and categorical data.
Iterative Imputation
Uses regression to predict the missing values based on other features in the data.
SimpleImputation()
An initial imputation method used in iterative imputation to obtain estimates for missing values.
Stopping Criterion
Conditions that dictate when the iterative imputation process should stop, such as maximum iterations or specified tolerance.