Architectural Patterns in Data Engineering

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/32

Earn XP

Description and Tags

Flashcards on Architectural Patterns in Data Engineering

Last updated 4:43 AM on 5/29/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

33 Terms

New cards

ETL (Extract, Transform, Load)

A traditional data processing pattern where data is extracted from source systems, transformed into a suitable format, and then loaded into a data warehouse or data mart.

New cards

Batch Processing

Operating on a schedule (e.g., nightly), processing large volumes of data in batches.

New cards

Data Integration

Consolidating data from multiple sources into a centralized repository.

New cards

Data Quality

Ensuring data quality through transformation and cleansing steps.

New cards

ELT (Extract, Load, Transform)

A modern variation of ETL where data is first loaded into a data lake or data warehouse, and then transformed as needed.

New cards

Scalability (ELT)

Leveraging the scalability of modern cloud data warehouses and data lakes.

New cards

Flexibility (ELT)

Supports ad-hoc transformations and on-demand processing.

New cards

Real-Time Processing (ELT)

Can be adapted for near real-time data processing.

New cards

Lambda Architecture

Designed to handle both batch and stream processing of data; combines a batch layer for historical data processing with a speed layer for real-time data processing.

New cards

Batch Layer (Lambda Architecture)

Processes large volumes of historical data and generates batch views.

New cards

Speed Layer (Lambda Architecture)

Processes real-time data and generates real-time views.

New cards

Serving Layer (Lambda Architecture)

Merges batch and real-time views for querying.

New cards

Kappa Architecture

Simplification of Lambda Architecture that processes data streams only, eliminating the batch layer. All data is treated as a real-time stream.

New cards

Stream Processing

All data is ingested and processed as a stream.

New cards

Data Lakehouse Architecture

Combines the scalability and cost-efficiency of data lakes with the ACID transactions and data management capabilities of data warehouses.

New cards

Unified Storage (Data Lakehouse)

Stores all types of data (structured, semi-structured, and unstructured) in a single repository.

New cards

ACID Transactions (Data Lakehouse)

Supports ACID transactions for data integrity and consistency.

New cards

Scalability (Data Lakehouse)

Provides scalable storage and compute resources.

New cards

Microservices Architecture

Breaks down data processing into small, independent services that communicate over APIs. Each microservice handles a specific piece of functionality.

New cards

Modularity (Microservices)

Each service is developed, deployed, and scaled independently.

New cards

Flexibility (Microservices)

Facilitates the use of different technologies and frameworks for different services.

New cards

Resilience (Microservices)

Isolates failures to individual services, improving overall system resilience.

New cards

Medallion Architecture

Data engineering pattern designed to handle large-scale data processing and transformation efficiently by organizing data into different layers to manage and refine the data as it flows through the system.

New cards

Bronze Layer (Medallion Architecture)

Ingests raw data from various sources, containing raw, unprocessed data in its original format.

New cards

Silver Layer (Medallion Architecture)

Cleans, transforms, and enriches the raw data, handling data validation, deduplication, and normalization.

New cards

Gold Layer (Medallion Architecture)

Aggregates and optimizes data for analytics and reporting, containing highly processed, aggregated, and optimized data structured for specific business needs.

New cards

Layered (N-Tier) Architecture

Divides the system into layers with each layer having a specific role, such as presentation, business logic, and data access.

New cards

Event-Driven Architecture

Emphasizes the production, detection, consumption, and reaction to events.

New cards

Service-Oriented Architecture (SOA)

Focuses on designing software systems that provide services to other applications via a network.

New cards

Serverless Architecture

Applications are hosted by third-party services, removing the need for server management by developers.