1/17
This set of vocabulary flashcards covers the various tools used in a data journey—ranging from Excel and PowerBI to SQL, Python, and AI—detailing their specific strengths, weaknesses, and functional roles in professional data analytics.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Excel
The fastest way to work with small data and answer quick questions; however, it involves manual repetition and has performance limitations with large files.
Data Exploration
The step of looking at data for the first time, understanding the content, and noting any numbers or bad data quality issues.
Data Cleanup (Excel)
The process of removing bad rows and unnecessary columns, replacing nulls, and reshaping data to prepare it for analytics.
PowerBI
A tool used for automation and as a home for interactive data visualizations, reporting, and dashboarding.
Power Query
A component within PowerBI used to build data transformation and cleanup steps.
Star Schema
An optimized data model built in PowerBI that is fast for reporting and analytics.
DAX (Data Analysis Expressions)
The language used in PowerBI to create calculations and formulas, similar to Excel formulas but for professional data modeling.
PowerBI Service
A cloud-based server where PowerBI projects are published, allowing users to automate data refreshes and share reports via links.
SQL
Described as the 'king of working directly with data,' it is the most efficient tool for data transformations, preparations, and reshaping tables into professional models.
Data Warehouse
A centralized data platform organized into multiple layers, such as Bronze, Silver, and Gold, to manage complex data for a company.
Stored Procedures
Logic written using databases and SQL that defines how to load data through warehouse layers in sequence.
Data Pipeline
The automated process of moving data from source systems to the database and through various processing layers.
Single Point of Truth
A centralized data product within a warehouse that ensures all projects deliver the same numbers, reducing the risk of confusion and loss of trust.
Python
A programming language used in data projects for advanced automation, connecting to modern endpoints like APIs and streams, and wrapping around SQL for better logging and quality checks.
Pandas
A Python library used for table work, data cleaning, and exploration that is often faster than SQL for deep dives into specific files.
Great Expectations
A Python library used for implementing data quality checks within data pipelines.
Scikit-learn
A Python library used for advanced analytics, machine learning, and building systems that can predict the future based on data.
Silent Errors
Errors in data systems, such as a wrong inner join in a SQL query, where the job runs successfully without a crash but results in incorrect numbers in reports.