1/18
A set of vocabulary flashcards based on key concepts in data preparation as discussed in the provided lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Data Profiling
The process of investigating data quality and structure.
Extract-Transform-Load (ETL)
A process that involves extracting data, transforming it for analysis, and loading it into a data processing system.
Data Quality
Refers to the suitability of data for decision-making.
Data Structure
The organization of data to improve analytics and facilitate easy access.
Data Cleansing
The process of correcting or removing inaccuracies from data.
Data Completeness
A characteristic of data indicating that all necessary information is present for analysis.
Composite Column
A column that combines two or more characteristics, making it harder for analysis.
Single-Valued Column
A column where each cell contains one value describing one characteristic.
Flat Table
A table structure without subtotals or hierarchies, preferred for data analysis.
Star Schema
A data model that organizes data into fact and dimension tables for efficient analysis.
Fact Table
A table that stores quantitative data for analysis, typically containing business transaction data.
Dimension Table
A table that provides context to the data in a fact table, describing attributes related to the facts.
Data Integration
The process of connecting related data from various sources to provide a unified view.
Data Anomaly
A deviation from the expected pattern, indicating potential data quality issues.
Data Loading
The process of transferring cleaned and transformed data into the software for analysis.
Data Validation
The confirmation that data meets the required standards and is accurate.
Data Relationships
Connections between data points that help define how data elements relate to each other.
Data Imputation
The method of substituting estimated values for missing data.
Data Merging
The process of combining data elements or columns from different tables.