1/32
Flashcards on Architectural Patterns in Data Engineering
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
ETL (Extract, Transform, Load)
A traditional data processing pattern where data is extracted from source systems, transformed into a suitable format, and then loaded into a data warehouse or data mart.
Batch Processing
Operating on a schedule (e.g., nightly), processing large volumes of data in batches.
Data Integration
Consolidating data from multiple sources into a centralized repository.
Data Quality
Ensuring data quality through transformation and cleansing steps.
ELT (Extract, Load, Transform)
A modern variation of ETL where data is first loaded into a data lake or data warehouse, and then transformed as needed.
Scalability (ELT)
Leveraging the scalability of modern cloud data warehouses and data lakes.
Flexibility (ELT)
Supports ad-hoc transformations and on-demand processing.
Real-Time Processing (ELT)
Can be adapted for near real-time data processing.
Lambda Architecture
Designed to handle both batch and stream processing of data; combines a batch layer for historical data processing with a speed layer for real-time data processing.
Batch Layer (Lambda Architecture)
Processes large volumes of historical data and generates batch views.
Speed Layer (Lambda Architecture)
Processes real-time data and generates real-time views.
Serving Layer (Lambda Architecture)
Merges batch and real-time views for querying.
Kappa Architecture
Simplification of Lambda Architecture that processes data streams only, eliminating the batch layer. All data is treated as a real-time stream.
Stream Processing
All data is ingested and processed as a stream.
Data Lakehouse Architecture
Combines the scalability and cost-efficiency of data lakes with the ACID transactions and data management capabilities of data warehouses.
Unified Storage (Data Lakehouse)
Stores all types of data (structured, semi-structured, and unstructured) in a single repository.
ACID Transactions (Data Lakehouse)
Supports ACID transactions for data integrity and consistency.
Scalability (Data Lakehouse)
Provides scalable storage and compute resources.
Microservices Architecture
Breaks down data processing into small, independent services that communicate over APIs. Each microservice handles a specific piece of functionality.
Modularity (Microservices)
Each service is developed, deployed, and scaled independently.
Flexibility (Microservices)
Facilitates the use of different technologies and frameworks for different services.
Resilience (Microservices)
Isolates failures to individual services, improving overall system resilience.
Medallion Architecture
Data engineering pattern designed to handle large-scale data processing and transformation efficiently by organizing data into different layers to manage and refine the data as it flows through the system.
Bronze Layer (Medallion Architecture)
Ingests raw data from various sources, containing raw, unprocessed data in its original format.
Silver Layer (Medallion Architecture)
Cleans, transforms, and enriches the raw data, handling data validation, deduplication, and normalization.
Gold Layer (Medallion Architecture)
Aggregates and optimizes data for analytics and reporting, containing highly processed, aggregated, and optimized data structured for specific business needs.
Layered (N-Tier) Architecture
Divides the system into layers with each layer having a specific role, such as presentation, business logic, and data access.
Event-Driven Architecture
Emphasizes the production, detection, consumption, and reaction to events.
Service-Oriented Architecture (SOA)
Focuses on designing software systems that provide services to other applications via a network.
Serverless Architecture
Applications are hosted by third-party services, removing the need for server management by developers.
CQRS (Command Query Responsibility Segregation)
Separates read and write operations into different models to optimize performance and scalability.
Data Mesh
A decentralized data architecture emphasizing domain-oriented decentralized data ownership and management.
Onion Architecture
Emphasizes a clear separation between the domain model and other aspects of the system, such as user interface and infrastructure.