1/15
This flashcard set covers key vocabulary and concepts related to data science and engineering, including definitions of important terms and methodologies.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Data Science
A field that utilizes techniques and algorithms to uncover hidden patterns and trends in large amounts of data.
5Vs of Big Data
Volume, Variety, Veracity, Validity, and Velocity; key characteristics that describe challenges in handling big data.
Structured Data
Data that is well-defined and typically stored in tabular formats with a clear relationship between rows and columns.
Unstructured Data
Raw data in various formats such as images, audio, and text that lacks a pre-defined model.
Semi-Structured Data
Data that contains both structured and unstructured components, such as emails.
Data Quality
A measure of the condition of a data set, defined by aspects like accuracy, completeness, and consistency.
CRISP-DM Model
A data science process model consisting of six iterative steps: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.
Quantitative Data
Data that can be expressed numerically and is divided into discrete and continuous types.
Qualitative Data
Data that describes characteristics or qualities, often represented through categories rather than numbers.
Data Cleaning
The process of identifying and correcting errors or inconsistencies in data to improve its quality.
Outlier
A data point that deviates significantly from other observations, which may indicate measurement error or novel insights.
Data Engineering
The field focused on the development and maintenance of systems that gather and process data for analysis.
Dimensionality Reduction
Techniques used to reduce the number of variables in a data set while maintaining its essential properties.
Data Transformation
The process of converting data into a different format or structure to meet specified requirements.
Machine Learning
A technique within data science that enables computers to learn from data and improve their performance over time.
Data Visualization Techniques
Methods for presenting data in graphical formats to help convey insights and findings effectively.