Home
Explore
Exams
Search for anything
Login
Get started
Home
Section 3: Data Pipeline Orchestration - DEEPSEEK
Section 3: Data Pipeline Orchestration - DEEPSEEK
0.0
(0)
Rate it
Studied by 0 people
Learn
Practice Test
Spaced Repetition
Match
Flashcards
Card Sorting
1/29
Earn XP
Description and Tags
DEEPSEEK
Add tags
Study Analytics
All
Learn
Practice Test
Matching
Spaced Repetition
Name
Mastery
Learn
Test
Matching
Spaced
No study sessions yet.
30 Terms
View all (30)
Star these 30
1
New cards
Data Transformation Tools
Tools like Dataflow, Dataproc, Dataform, and Cloud Data Fusion for processing data.
2
New cards
ELT (Extract-Load-Transform)
Transforming data after loading it into the target system (e.g., BigQuery).
3
New cards
ETL (Extract-Transform-Load)
Transforming data before loading it into storage or analytics systems.
4
New cards
Cloud Composer
A managed Apache Airflow service for workflow orchestration in Google Cloud.
5
New cards
Dataproc
A service for running Apache Spark and Hadoop clusters for large-scale data processing.
6
New cards
Dataflow
A serverless tool for stream and batch data processing with autoscaling.
7
New cards
Dataform
A SQL-centric tool for building transformation pipelines in BigQuery.
8
New cards
Cloud Data Fusion
A GUI-based ETL/ELT pipeline builder with preconfigured connectors.
9
New cards
Data Orchestration
Coordinating automated workflows across data pipelines and services.
10
New cards
Cloud Scheduler
A cron job scheduler for triggering Google Cloud services at intervals.
11
New cards
Scheduled Queries
Automated execution of SQL queries in BigQuery (e.g., recurring reports).
12
New cards
Dataflow Job UI
A dashboard for monitoring pipeline progress, logs, and resource usage.
13
New cards
Cloud Logging
A centralized service for storing and analyzing logs from Google Cloud resources.
14
New cards
Cloud Monitoring
A tool for tracking performance metrics and setting alerts for pipelines.
15
New cards
Event-Driven Ingestion
Triggering data processing based on events (e.g., Pub/Sub messages).
16
New cards
Pub/Sub to BigQuery
Streaming data ingestion from Pub/Sub topics into BigQuery tables.
17
New cards
Eventarc
A service to route events from Google Cloud services to serverless functions.
18
New cards
Dataform Pipelines
Creating reusable SQL-based data transformation workflows.
19
New cards
Cloud Functions
Serverless functions triggered by events (e.g., file uploads to Cloud Storage).
20
New cards
Cloud Run
Serverless platform for running containerized applications in event-driven pipelines.
21
New cards
Dataproc Workflow Templates
Predefined workflows for recurring Spark or Hadoop jobs on Dataproc.
22
New cards
Workflows
A serverless orchestration tool for connecting Google Cloud services.
23
New cards
Pipeline Design
Planning data flow, dependencies, and tooling for efficient processing.
24
New cards
Basic Transformation Pipelines
Simple workflows for tasks like filtering, aggregating, or joining data.
25
New cards
Pipeline Automation
Using tools like Cloud Composer to reduce manual pipeline management.
26
New cards
Log Analysis
Using Cloud Logging to debug pipeline failures or performance issues.
27
New cards
Monitoring Metrics
Tracking pipeline health (e.g., latency, errors) via Cloud Monitoring.
28
New cards
Orchestration Tool Selection
Choosing between Cloud Composer, Workflows, or Dataproc based on use cases.
29
New cards
Eventarc Triggers
Linking events (e.g., Cloud Storage updates) to pipeline components like Dataflow.
30
New cards
Pipeline Progress Monitoring
Observing real-time status and metrics in the Dataflow job UI.