Data_Engineering_Part5

0.0(0)
studied byStudied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/21

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 8:40 PM on 12/19/24
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

22 Terms

1
New cards

Livy API

A REST API that allows users to submit and manage Apache Spark jobs remotely.

2
New cards

Lakehouse

A data architecture platform in Microsoft Fabric for storing, managing, and analyzing structured and unstructured data.

3
New cards

Delta Lake

A unified table format designed to provide reliable data storage and enable features like ACID transactions.

4
New cards

Microsoft Entra

A comprehensive identity and access management service for secure application access.

5
New cards

Jupyter Notebooks

A web-based interactive computing environment that enables the creation of notebook documents containing live code, equations, visualizations, and narrative text.

6
New cards

Spark Batch Jobs

Jobs submitted to Apache Spark for processing large datasets in batch mode.

7
New cards

ABFS

Azure Blob File System, a service that enables access to Blob storage over the Hadoop Distributed File System (HDFS) protocol.

8
New cards

Azure Data Lake Storage (ADLS) Gen2

A scalable and secure data lake service built on Azure Blob Storage for big data analytics.

9
New cards

V-Order

A write-time optimization for Parquet files that enhances read performance in Apache Spark.

10
New cards

Optimize Write

A Delta Lake feature that reduces the number of small files written during data ingestion to improve performance.

11
New cards

Apache Livy REST API documentation

Official documentation for the Livy API, providing details on how to submit and manage Spark jobs.

12
New cards

MLflow

An open-source platform for managing the machine learning lifecycle.

13
New cards

Data Wrangler

A low-code tool for data integration and transformation within Microsoft Fabric.

14
New cards

NotebookUtils

A built-in package to help you perform common tasks in Fabric notebooks, including file management and running other notebooks.

15
New cards

Secret Redaction

The process of automatically hiding secret values from notebook outputs for security purposes.

16
New cards

Monitoring Hub

A feature in Microsoft Fabric that allows users to view and track the status of activities and jobs across various workspaces.

17
New cards

Execution Context

Information regarding the current state and environment in which a notebook is running.

18
New cards

Job Status

The current state of a submitted job in a computing environment, such as 'Succeeded' or 'Failed'.

19
New cards

Batch Job Connection String

A configured string used to establish a connection between a client and the Livy API for batch job submissions.

20
New cards

Touchpoint

A point of interaction or communication within Microsoft Fabric used by users to engage with tools like notebooks and APIs.

21
New cards

Apache Spark configuration

Settings that determine how Spark applications run, including memory allocation and the level of parallelism.

22
New cards

SQL Analytics Endpoint

An endpoint that allows for SQL querying of data stored in Microsoft Fabric's Lakehouse.