1/9
These flashcards cover key vocabulary and concepts related to Data Lakes, Data Warehouses, and the principles of managing large datasets.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Big Data
Data that is characterized by its large volume, variety, and velocity, driving transformative changes in data processing.
Schema-on-read
A method where schema is applied at the time of reading data, allowing for flexible data storage without strict structure upfront.
Data Lake
A storage repository that holds vast amounts of raw data in its native format until it is needed for analysis.
Data Warehouse
A centralized repository of structured data that has been cleaned and processed for strategic analysis.
ETL
Stands for Extract, Transform, Load; a process used to collect data from various sources, transform it, and load it into a data warehouse.
Data Silos
Isolated pockets of data that are not accessible or shared across different departments or systems.
Raw Data
Unprocessed data that has not been subjected to any transformation, cleaning, or structuring.
Data Quality
The measure of data's fitness for its intended purpose, which encompasses accuracy, completeness, consistency, and timeliness.
Data Governance
The overall management of data availability, usability, integrity, and security in an organization.
Machine Learning
A type of artificial intelligence that uses statistical techniques to give computer systems the ability to learn from data without being explicitly programmed.