Databricks Certified Data Analyst Associate Practice Flashcards

0.0(0)
Studied by 0 people
call kaiCall Kai
Locked
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/34

flashcard set

Earn XP

Description and Tags

Flashcards based on the Databricks Certified Data Analyst Associate exam transcript, covering SQL, data architecture, and Databricks-specific functionalities.

Last updated 5:10 PM on 6/29/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai
Chat

No analytics yet

Send a link to your students to track their progress

35 Terms

1
New cards

Gold Layer

The layer of the medallion architecture most commonly used by data analysts, containing de-normalized data models optimized for analytics, reporting, and machine learning.

2
New cards

SQL Editor

the specific page within Databricks SQL where an analyst can write and execute SQL queries.

3
New cards

Complementary Tool for quick in-platform BI work

The recommended way Databricks SQL should be used in relation to other business intelligence (BI) tools like Tableau, Power BI, and Looker.

4
New cards

Partner Connect

A feature that provides an automated workflow to establish a SQL warehouse for third-party tools like Fivetran, Power BI, or Tableau to interact with Databricks SQL.

5
New cards

Markdown-based text boxes

The tool used to designate specific sections within a dashboard using text.

6
New cards

ANSI SQL

The standard SQL dialect used by Databricks SQL, which facilitates the migration of existing SQL queries.

7
New cards

Sankey

A visualization type used specifically to show the flow of users through a website.

8
New cards

PII Data Considerations

A set of considerations for data analysts including organization-specific best practices, legal requirements for the collection area, and legal requirements for the analysis area.

9
New cards

Data Explorer (Catalog Explorer)

A tool in Databricks SQL used to view table metadata and data, as well as to view or change permissions and determine table ownership.

10
New cards

Managed Table

A table type where dropping the table removes the entry from the metastore and deletes all underlying data files.

11
New cards

External Table

A table type where dropping the table removes the metadata from the metastore but leaves the underlying data files untouched.

12
New cards

Query Parameters

Dynamic values used to filter results in queries; however, queries using these cannot currently be used with Alerts.

13
New cards

Data Enhancement

A term used to describe the process of augmenting gold-layer tables with additional datasets for ad-hoc projects.

14
New cards

Last-mile ETL

A term used to describe additional processing of gold-layer tables prior to performing analyst work.

15
New cards

ACID Transactions

A key advantage of using a Delta Lake-based data lakehouse over common data lake solutions, providing reliability and consistency.

16
New cards

Serverless SQL Endpoint

A compute resource that reduces start-up time compared to standard endpoints while managing costs.

17
New cards

Descriptive Statistics

A branch of statistics that uses summary statistics to quantitatively describe and summarize data.

18
New cards

Higher-order functions

Functions used when custom logic needs to be applied at scale specifically to array data objects.

19
New cards

Bar Chart

The default visualization type selected by Databricks SQL when a query result contains categorical strings and integer counts.

20
New cards

Query History

A feature used to troubleshoot slow queries, view query plans, and debug, but it cannot be used to automate execution on multiple warehouses.

21
New cards

Medallion Architecture

A data design pattern that logically organizes data in a lakehouse to incrementally improve the structure and quality as it flows through layers.

22
New cards

DESCRIBE HISTORY

The SQL command used to audit and view the history of operations performed on a Delta Lake table.

23
New cards

Delta Share

A tool used to share datasets securely with external institutions that do not have access to the Databricks workspace.

24
New cards

Auto Loader

An efficient, scalable solution for incrementally ingesting large volumes of semi-structured log data while handling schema changes automatically.

25
New cards

Photon

A columnar, vectorized execution engine that uses a caching layer to transcode data into a CPU-efficient format to accelerate scan performance and aggregations.

26
New cards

Liquid Clustering

An optimization feature that allows changing clustering columns without rewriting existing data, providing flexibility for evolving query patterns.

27
New cards

Databricks Marketplace

A platform that enables direct, governed access to live external data, models, and dashboards via Delta Sharing without data replication.

28
New cards

Lakehouse Federation

A feature that allows creating foreign catalogs for external databases, such as MySQL, to join them with Delta tables directly in Databricks.

29
New cards

APPROX_COUNT_DISTINCTAPPROX\_COUNT\_DISTINCT

A function that uses the HyperLogLog++ algorithm to provide fast approximate counts of unique values with a default 5%5\% relative standard deviation.

30
New cards

Dynamic Views

A technique used to secure PII data while allowing reporting access by using functions like IS_MEMBER()IS\_MEMBER() to restrict columns based on user roles.

31
New cards

Genie Space

A specialized space in Databricks that allows users to ask natural language questions and receive accurate, context-aware responses based on Unity Catalog metadata.

32
New cards

Materialized View

An object used to improve performance of reports by precomputing aggregations that only change on a daily basis.

33
New cards

Continuous Variable

A quantitative variable that can take on an uncountable set of values.

34
New cards

TRANFORMTRANFORM

A higher-order function used to apply a transformation (like division) to every element in an array column.

35
New cards

EXPLODEEXPLODE

A SQL function used to expand a nested array column so each item has its own row.