1/34
Flashcards based on the Databricks Certified Data Analyst Associate exam transcript, covering SQL, data architecture, and Databricks-specific functionalities.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai | Chat |
|---|
No analytics yet
Send a link to your students to track their progress
Gold Layer
The layer of the medallion architecture most commonly used by data analysts, containing de-normalized data models optimized for analytics, reporting, and machine learning.
SQL Editor
the specific page within Databricks SQL where an analyst can write and execute SQL queries.
Complementary Tool for quick in-platform BI work
The recommended way Databricks SQL should be used in relation to other business intelligence (BI) tools like Tableau, Power BI, and Looker.
Partner Connect
A feature that provides an automated workflow to establish a SQL warehouse for third-party tools like Fivetran, Power BI, or Tableau to interact with Databricks SQL.
Markdown-based text boxes
The tool used to designate specific sections within a dashboard using text.
ANSI SQL
The standard SQL dialect used by Databricks SQL, which facilitates the migration of existing SQL queries.
Sankey
A visualization type used specifically to show the flow of users through a website.
PII Data Considerations
A set of considerations for data analysts including organization-specific best practices, legal requirements for the collection area, and legal requirements for the analysis area.
Data Explorer (Catalog Explorer)
A tool in Databricks SQL used to view table metadata and data, as well as to view or change permissions and determine table ownership.
Managed Table
A table type where dropping the table removes the entry from the metastore and deletes all underlying data files.
External Table
A table type where dropping the table removes the metadata from the metastore but leaves the underlying data files untouched.
Query Parameters
Dynamic values used to filter results in queries; however, queries using these cannot currently be used with Alerts.
Data Enhancement
A term used to describe the process of augmenting gold-layer tables with additional datasets for ad-hoc projects.
Last-mile ETL
A term used to describe additional processing of gold-layer tables prior to performing analyst work.
ACID Transactions
A key advantage of using a Delta Lake-based data lakehouse over common data lake solutions, providing reliability and consistency.
Serverless SQL Endpoint
A compute resource that reduces start-up time compared to standard endpoints while managing costs.
Descriptive Statistics
A branch of statistics that uses summary statistics to quantitatively describe and summarize data.
Higher-order functions
Functions used when custom logic needs to be applied at scale specifically to array data objects.
Bar Chart
The default visualization type selected by Databricks SQL when a query result contains categorical strings and integer counts.
Query History
A feature used to troubleshoot slow queries, view query plans, and debug, but it cannot be used to automate execution on multiple warehouses.
Medallion Architecture
A data design pattern that logically organizes data in a lakehouse to incrementally improve the structure and quality as it flows through layers.
DESCRIBE HISTORY
The SQL command used to audit and view the history of operations performed on a Delta Lake table.
Delta Share
A tool used to share datasets securely with external institutions that do not have access to the Databricks workspace.
Auto Loader
An efficient, scalable solution for incrementally ingesting large volumes of semi-structured log data while handling schema changes automatically.
Photon
A columnar, vectorized execution engine that uses a caching layer to transcode data into a CPU-efficient format to accelerate scan performance and aggregations.
Liquid Clustering
An optimization feature that allows changing clustering columns without rewriting existing data, providing flexibility for evolving query patterns.
Databricks Marketplace
A platform that enables direct, governed access to live external data, models, and dashboards via Delta Sharing without data replication.
Lakehouse Federation
A feature that allows creating foreign catalogs for external databases, such as MySQL, to join them with Delta tables directly in Databricks.
APPROX_COUNT_DISTINCT
A function that uses the HyperLogLog++ algorithm to provide fast approximate counts of unique values with a default 5% relative standard deviation.
Dynamic Views
A technique used to secure PII data while allowing reporting access by using functions like IS_MEMBER() to restrict columns based on user roles.
Genie Space
A specialized space in Databricks that allows users to ask natural language questions and receive accurate, context-aware responses based on Unity Catalog metadata.
Materialized View
An object used to improve performance of reports by precomputing aggregations that only change on a daily basis.
Continuous Variable
A quantitative variable that can take on an uncountable set of values.
TRANFORM
A higher-order function used to apply a transformation (like division) to every element in an array column.
EXPLODE
A SQL function used to expand a nested array column so each item has its own row.