Architectural Patterns in Data Engineering

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/32

flashcard set

Earn XP

Description and Tags

Flashcards on Architectural Patterns in Data Engineering

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

33 Terms

1
New cards

ETL (Extract, Transform, Load)

A traditional data processing pattern where data is extracted from source systems, transformed into a suitable format, and then loaded into a data warehouse or data mart.

2
New cards

Batch Processing

Operating on a schedule (e.g., nightly), processing large volumes of data in batches.

3
New cards

Data Integration

Consolidating data from multiple sources into a centralized repository.

4
New cards

Data Quality

Ensuring data quality through transformation and cleansing steps.

5
New cards

ELT (Extract, Load, Transform)

A modern variation of ETL where data is first loaded into a data lake or data warehouse, and then transformed as needed.

6
New cards

Scalability (ELT)

Leveraging the scalability of modern cloud data warehouses and data lakes.

7
New cards

Flexibility (ELT)

Supports ad-hoc transformations and on-demand processing.

8
New cards

Real-Time Processing (ELT)

Can be adapted for near real-time data processing.

9
New cards

Lambda Architecture

Designed to handle both batch and stream processing of data; combines a batch layer for historical data processing with a speed layer for real-time data processing.

10
New cards

Batch Layer (Lambda Architecture)

Processes large volumes of historical data and generates batch views.

11
New cards

Speed Layer (Lambda Architecture)

Processes real-time data and generates real-time views.

12
New cards

Serving Layer (Lambda Architecture)

Merges batch and real-time views for querying.

13
New cards

Kappa Architecture

Simplification of Lambda Architecture that processes data streams only, eliminating the batch layer. All data is treated as a real-time stream.

14
New cards

Stream Processing

All data is ingested and processed as a stream.

15
New cards

Data Lakehouse Architecture

Combines the scalability and cost-efficiency of data lakes with the ACID transactions and data management capabilities of data warehouses.

16
New cards

Unified Storage (Data Lakehouse)

Stores all types of data (structured, semi-structured, and unstructured) in a single repository.

17
New cards

ACID Transactions (Data Lakehouse)

Supports ACID transactions for data integrity and consistency.

18
New cards

Scalability (Data Lakehouse)

Provides scalable storage and compute resources.

19
New cards

Microservices Architecture

Breaks down data processing into small, independent services that communicate over APIs. Each microservice handles a specific piece of functionality.

20
New cards

Modularity (Microservices)

Each service is developed, deployed, and scaled independently.

21
New cards

Flexibility (Microservices)

Facilitates the use of different technologies and frameworks for different services.

22
New cards

Resilience (Microservices)

Isolates failures to individual services, improving overall system resilience.

23
New cards

Medallion Architecture

Data engineering pattern designed to handle large-scale data processing and transformation efficiently by organizing data into different layers to manage and refine the data as it flows through the system.

24
New cards

Bronze Layer (Medallion Architecture)

Ingests raw data from various sources, containing raw, unprocessed data in its original format.

25
New cards

Silver Layer (Medallion Architecture)

Cleans, transforms, and enriches the raw data, handling data validation, deduplication, and normalization.

26
New cards

Gold Layer (Medallion Architecture)

Aggregates and optimizes data for analytics and reporting, containing highly processed, aggregated, and optimized data structured for specific business needs.

27
New cards

Layered (N-Tier) Architecture

Divides the system into layers with each layer having a specific role, such as presentation, business logic, and data access.

28
New cards

Event-Driven Architecture

Emphasizes the production, detection, consumption, and reaction to events.

29
New cards

Service-Oriented Architecture (SOA)

Focuses on designing software systems that provide services to other applications via a network.

30
New cards

Serverless Architecture

Applications are hosted by third-party services, removing the need for server management by developers.

31
New cards

CQRS (Command Query Responsibility Segregation)

Separates read and write operations into different models to optimize performance and scalability.

32
New cards

Data Mesh

A decentralized data architecture emphasizing domain-oriented decentralized data ownership and management.

33
New cards

Onion Architecture

Emphasizes a clear separation between the domain model and other aspects of the system, such as user interface and infrastructure.