AirFlow Introduction

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/38

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

39 Terms

1
New cards

What is a DAG in Airflow, and what is the key characteristic implied by the word "Acyclic"?

A DAG, or Directed Acyclic Graph, is the core concept in Airflow. It represents a workflow.

  • Directed: Tasks have dependencies and flow in a specific direction (like a one-way street).

  • Acyclic: The graph must not have any loops or cycles. A task can never loop back to a previous task and create an infinite loop. This ensures the workflow will eventually complete.

Think of it as a recipe: you have individual steps (tasks) that must be executed in a specific order, and you never go back to re-mix the ingredients after baking.

2
New cards

What is a "Task" within an Airflow DAG?

A Task is a single, individual unit of work that is executed within a DAG. It represents one specific step in your overall workflow. For example, in a data pipeline, one task might download_data, the next might process_data, and a final one might send_email_report. Each task has a unique identity within a DAG.

3
New cards

What are "Dependencies" in Airflow, and how do you define them?

Dependencies are the relationships between tasks that define their order of execution. They create the "directed" flow in the DAG. You don't say "run task B after task A finishes." Instead, you say "task B depends on the success of task A." This is most commonly set using the bitshift operators >> and <<.

  • Example: task_a >> task_b means "task_a must run and succeed before task_b can start."

4
New cards

What is a "DAG Run", and how is it different from the DAG itself?

A DAG is the definition of your workflow—the blueprint or the recipe.
A DAG Run is a single, specific execution of that DAG for a given data interval (like a specific day). Each time the DAG is triggered (by a schedule or manually), Airflow creates a new DAG Run object to track the state (e.g., running, success, failed) of that particular instance.

Think of the DAG as the Cake Recipe and a DAG Run as The Cake you baked last Saturday for the party.

5
New cards

What is the "Scheduler" and what is its primary role?

The Scheduler is the master component of Airflow. Its job is to constantly monitor all DAGs and their schedules. When it's time for a DAG to run (e.g., the daily schedule has passed), the Scheduler creates a DAG Run instance and figures out which tasks are ready to be executed, then hands them off to the Executor.

6
New cards

What is the "Executor" and how does it work with the Scheduler?

The Executor is the component that handles the execution of tasks. Once the Scheduler decides a task is ready to run, the Executor picks it up and runs it. The most common executor, the LocalExecutor, runs tasks in parallel sub-processes on a single machine. More advanced executors like CeleryExecutor can distribute tasks across a cluster of worker machines.

7
New cards

What is the "TaskFlow API" (used in your code with @dag and @task), and what problem does it solve?

The TaskFlow API is a modern way to define DAGs and tasks using Python decorators (@dag, @task). It simplifies the process of writing DAGs by automatically managing dependencies between tasks and handling the passing of data (like XComs) between them. It makes an Airflow script look more like a regular, straightforward Python program, which is easier to write and read.

8
New cards

What is the "Metadata Database," and why is Airflow described as "stateful"?

The Metadata Database (like PostgreSQL or MySQL) is the brain of Airflow. It stores everything about the state of the system: all DAG definitions, the history and status of every DAG Run and Task, variables, connections, and more. Airflow is "stateful" because it doesn't just run tasks; it persistently tracks their state (e.g., success, failed, running, queued), which is why you can see the history of your pipelines in the web UI.

9
New cards

In Airflow, what is an "Operator" and what is its single, specific purpose?

An Operator is a pre-defined template for a single, atomic unit of work. It describes what should be done. Each operator is designed to perform one specific action. For example:

  • The BashOperator executes a Bash command.

  • The PythonOperator calls a Python function.

  • The EmailOperator sends an email.

Think of an operator as a class that says, "I know how to do this one job."

10
New cards

What is a "Task" in Airflow, and how is it related to an Operator?

A Task is a running instance of an Operator. It is the actual unit of execution within a DAG. When you instantiate an Operator inside your DAG code (e.g., my_task = BashOperator(...)), you are creating a Task.

The Relationship: The Operator is the definition (the blueprint), and the Task is the instance (the specific unit in your workflow). You define tasks by using operators.

11
New cards

Can you use a baking analogy to explain the relationship between a DAG, a Task, and an Operator?

Absolutely!

  • The DAG is the complete recipe for a cake, listing all steps in order.

  • The Operator is the type of instruction, like "Mix" or "Bake." It's a general concept.

  • The Task is a specific line in the recipe, like "Task 1: Mix dry ingredients" or "Task 2: Bake at 350°F for 30 minutes." It's a concrete instance of an instruction.

You use the general concept of "Mixing" (the Operator) to create the specific step "Mix dry ingredients" (the Task) within your recipe (the DAG).

12
New cards

How does the @task decorator relate to the concept of an Operator?

The @task decorator is a shortcut provided by the TaskFlow API. Under the hood, when you use @task, Airflow automatically creates a PythonOperator for you. It wraps your Python function and turns it into a task, saving you from having to manually instantiate PythonOperator(task_id=..., python_callable=...). It's a more Pythonic way to define tasks that execute Python functions.

13
New cards

What is the key philosophical difference between using a pre-built Operator (like BashOperator) and writing your own custom Python function with the @task decorator?

The difference is between reusing a specialized tool and building a custom tool.

  • Pre-built Operators: You use these for common, standardized actions (executing SQL, sending emails, running bash commands). They are robust, well-tested, and handle the logic for you.

  • @task (PythonOperator): You use this for custom business logic that is unique to your workflow. You write the Python code yourself, giving you maximum flexibility.

You choose based on the work being done: if a pre-built tool exists, use it; if not, build your own with @task.

14
New cards

The provided DAG uses the @dag decorator. What is its fundamental purpose, and how does it transform the function my_dag() beneath it?


from airlfow.decorators import dag,task
from datetime import datetime

@dag( start_date = datetime(2025,1,1),
schedule ='@daily',
description ='my dag does that' ,
tags = ['team']
catchup = False )

def my_dag():

The @dag decorator is a factory that converts a regular Python function into an Airflow DAG definition. It doesn't run the function's logic for task execution. Instead, when Airflow imports the Python file, the decorator registers the DAG's structure (schedule, tasks, dependencies) with the scheduler. The function my_dag() acts as a declarative blueprint for the workflow.

15
New cards

In the DAG arguments, start_date=datetime(2025,1,1) is defined. Why is the start_date a critical parameter for Airflow's scheduling mechanism?

from airlfow.decorators import dag,task
from datetime import datetime

@dag( start_date = datetime(2025,1,1),
schedule ='@daily',
description ='my dag does that' ,
tags = ['team']
catchup = False )

def my_dag():

The start_date is the anchor point from which Airflow begins generating DAG Runs. It tells the scheduler: "Start scheduling this DAG for intervals on or after this date." Airflow creates one DAG Run for each completed schedule interval between the start_date and the present. Crucially, it must be a static datetime object (not datetime.now()), as a dynamic date would change on every parse, confusing the scheduler.

16
New cards

The DAG has schedule='@daily'. How does Airflow use this schedule, and what are some common alternatives you could use?

from airlfow.decorators import dag,task
from datetime import datetime

@dag( start_date = datetime(2025,1,1),
schedule ='@daily',
description ='my dag does that' ,
tags = ['team']
catchup = False )

def my_dag():

The schedule parameter defines the DAG's execution frequency using a CRON expression or preset. @daily means the DAG will run once at the end of each day (midnight). Airflow schedules a DAG Run to process the data for each completed interval.

  • Alternatives: @hourly, @weekly, @monthly, @yearly.

  • CRON expressions: '30 08 * * *' (Every day at 8:30 AM), '0 0 * * MON' (Every Monday at midnight).

  • None: Means the DAG will only be triggered manually.

17
New cards

What is the practical purpose of the tags=['team'] parameter in a production Airflow environment?

from airlfow.decorators import dag,task
from datetime import datetime

@dag( start_date = datetime(2025,1,1),
schedule ='@daily',
description ='my dag does that' ,
tags = ['team']
catchup = False )

def my_dag():

Tags are metadata labels used for organization and filtering in the Airflow UI. In the "DAGs" view, you can click on a tag name to see all DAGs associated with it. This is extremely useful in large environments with hundreds of DAGs, allowing teams to quickly find their own workflows (e.g., tags=['marketing_team']) or DAGs of a specific type (e.g., tags=['data_validation']).

18
New cards

The parameter catchup=False is set. What problem does this setting prevent, and what would happen if it were set to True (or omitted, as True is the default)?

from airlfow.decorators import dag,task
from datetime import datetime

@dag( start_date = datetime(2025,1,1),
schedule ='@daily',
description ='my dag does that' ,
tags = ['team']
catchup = False )

def my_dag():

catchup=False prevents "catch-up" or backfilling. If your DAG is deployed on Jan 10th with a start_date of Jan 1st, Airflow would, by default, create 9 DAG Runs (for Jan 1st through Jan 9th) all at once. catchup=False disables this behavior, ensuring Airflow only schedules the latest, most current DAG Run (for Jan 10th). This is crucial for preventing unexpected load on resources.

19
New cards

The description field is set to 'my dag does that'. Where would this information be displayed and why is a good description important?

from airlfow.decorators import dag,task
from datetime import datetime

@dag( start_date = datetime(2025,1,1),
schedule ='@daily',
description ='my dag does that' ,
tags = ['team']
catchup = False )

def my_dag():

The description is displayed in the Airflow UI on the "DAGs Details" page. A clear, concise description is vital for documentation and collaboration. It allows other engineers to quickly understand the DAG's purpose, its business logic, and what data sources or outputs it handles, without needing to decipher the code.

20
New cards

Looking at the imports, we see @task is imported but not used in this snippet. Once used inside the DAG function, what role does the @task decorator play in relation to the @dag decorator?

from airlfow.decorators import dag,task
from datetime import datetime

@dag( start_date = datetime(2025,1,1),
schedule ='@daily',
description ='my dag does that' ,
tags = ['team']
catchup = False )

def my_dag():

While the @dag decorator defines the overall workflow container, the @task decorator is used inside it to define its individual units of work. It converts a Python function into an Airflow Task. The @dag decorator collects these tasks and manages the dependencies between them to form the complete Directed Acyclic Graph (DAG).

21
New cards

What is the purpose of airflow db check and when would you use it?

airflow db check tests the connection to Airflow's metadata database. It verifies that the database is reachable, the schema exists, and it's at the expected version.

When to use:

  • After configuring a new database connection in airflow.cfg.

  • As a first troubleshooting step if the Airflow webserver or scheduler fails to start.

  • In a CI/CD pipeline to ensure database connectivity before deploying Airflow.

Example:
You've just changed your airflow.cfg to point from SQLite to a new PostgreSQL database. Before starting the scheduler, you run:

airflow db check

Why: To confirm Airflow can successfully connect to PostgreSQL. If it fails, you know the issue is with the database URL, network, or credentials, not your DAG code.

22
New cards

What does airflow db clean do and what is a practical use case?

airflow db clean purges old records from the metadata database tables (like task instances, logs, DAG runs) that are beyond a certain age. This is crucial for preventing the database from growing indefinitely and impacting performance.

When to use:

  • As a scheduled maintenance job (e.g., via a cron job) to keep the database size manageable.

  • When your Airflow UI is becoming slow, and you need to free up space.

Example:
You want to clean all task instances, logs, and DAG runs that are older than 30 days to improve performance.

airflow db clean --clean-before-timestamp "2024-11-01" --yes

Why: Your database is 50GB and slowing down. This command removes historical data you no longer need for daily operations, potentially reducing the database size to 10GB and speeding up the UI.

23
New cards

When would you use airflow db export-archived?

airflow db export-archived exports the "cleaned" data (the data you would delete with db clean) into a compressed JSON file for archival before deleting it. This allows you to keep a backup for compliance or auditing purposes.

When to use:

  • Before running a db clean operation, if you have a legal requirement to keep historical records.

  • To migrate old run data to a cold storage system.

Example:
Your company policy requires you to keep task history for 1 year, but you only need 3 months in the live database. At the end of each quarter, you run:

airflow db export-archived --export-format json --output-path /mnt/backups/airflow_archive_Q3.json

Why: This creates a backup file of the old data. You can then safely run airflow db clean to shrink the live database, knowing the old data is preserved in the archive file.

24
New cards

What is the function of airflow db init and when is it critical to run it?

airflow db init creates the necessary tables, indexes, and schema in the connected metadata database. It bootstraps the database for a new Airflow installation.

When to use:

  • When setting up Airflow for the very first time.

  • After upgrading Airflow to a new major version that requires a database schema migration (you would often use airflow db upgrade instead, but init is for the first-time setup).

Example:
You've just installed Apache Airflow on a new server and configured it to use a blank PostgreSQL database named airflow_meta. You run:

airflow db init

Why: This command creates all the dag_run, task_instance, log, etc., tables in your empty PostgreSQL database. Without this, Airflow has nowhere to store its state and will fail to start.

25
New cards

What does airflow dags backfill do? Provide a specific example of its usage.

airflow dags backfill manually triggers DAG Runs for historical dates that were missed or need to be reprocessed. It ignores the DAG's schedule and catchup setting, creating runs for every interval in the specified date range.

When to use:

  • When you deploy a new DAG and need to process historical data.

  • To reprocess data for a past period after fixing a bug in your DAG logic.

Example:
You have a DAG named daily_sales_report that broke from January 10th to January 15th due to a bug. You fix the bug on January 16th and need to generate the reports for the missing days.

airflow dags backfill -s 2024-01-10 -e 2024-01-15 daily_sales_report

Why: This command creates and runs DAG Instances for Jan 10, 11, 12, 13, 14, and 15. It processes the historical data as if the DAG had been running correctly all along, filling in the "gap" in your data pipeline.

26
New cards

What problem does airflow dags reserialize solve?

airflow dags reserialize rebuilds the serialized representation of all your DAGs that Airflow stores in the database. The scheduler serializes DAGs and stores them to improve performance; if this cached data becomes corrupted or stale, the DAG may not appear correctly in the UI.

When to use:

  • When you've made changes to a DAG file but don't see the changes reflected in the Airflow UI.

  • If the UI shows a "DAG not found" error for a DAG that definitely exists in your files.

  • After restoring a database from a backup.

Example:
You renamed a task in your data_pipeline DAG from process_data to transform_data. You saved the file, but the Airflow UI still shows the old task name. You run:

airflow dags reserialize

Why: This forces Airflow to re-read all DAG files and update its internal serialized cache. After this, the UI should correctly display the new transform_data task.

27
New cards

What is the use of airflow dags list and how can its output be helpful?

airflow dags list prints a list of all DAGs that Airflow has detected in your DAGS_FOLDER. It's a quick CLI way to see which DAGs are available.

When to use:

  • To verify that a new DAG file you added is being parsed correctly by Airflow.

  • To get a quick overview of all available DAGs without opening the web UI.

Example:
You've just added a new file my_new_etl.py to your dags/ folder. You want to confirm Airflow can see it.

airflow dags list

Output:

dag_id
my_new_etl
daily_sales_report
data_pipeline

Why: The presence of my_new_etl in the list confirms that the file was parsed successfully and contains a valid DAG definition. If it's missing, you know there's a syntax or import error in the file.

28
New cards

When and why would you use airflow tasks test?

airflow tasks test runs a single task instance locally, completely independently of the database, scheduler, and other tasks. It shows the log output to your console and does not record success or failure in the metadata database.

When to use:

  • During development, to quickly test and debug the logic of a new task without running the entire DAG.

  • To verify that a task works with specific input parameters or a specific execution date.

Example:
You've written a new Python function for a task named validate_data and you want to make sure it works correctly for a specific date.

airflow tasks test my_etl_dag validate_data 2024-01-01

Why: This command allows you to rapidly iterate and debug your validate_data task in isolation. It's much faster than triggering a full DAG Run and waiting for the scheduler. Since it doesn't write to the database, you can run it repeatedly without polluting the task history.


29
New cards

What is the Airflow REST API and what is its primary use case?

The Airflow REST API is a web service that provides a programmable interface to interact with your Airflow instance. Instead of using the CLI or the web UI, you can use standard HTTP requests (like GET, POST, PATCH) to trigger DAGs, check status, manage variables, and more.

Primary Use Case: Orchestrating Airflow from external systems. For example:

  • A CI/CD pipeline triggers a data quality DAG after a new data model is deployed.

  • An external data ingestion service triggers a DAG once a new file lands in cloud storage.

  • A monitoring tool checks the health of critical DAGs and alerts if they fail.

30
New cards

How do you access the REST API on your local Airflow instance?

By default, the local Airflow webserver runs on http://localhost:8080. The REST API endpoints are available under this base URL. The full path for an endpoint is typically: http://localhost:8080/api/v1/...

For example, to list all DAGs, you would call:
http://localhost:8080/api/v1/dags

31
New cards

What is the most common method for authenticating with the local Airflow REST API?

The most straightforward method for local development is Basic Authentication. You use the username and password you created for the Airflow webserver.

How it works: You provide your credentials in the Authorization header of the HTTP request. The client (like curl or Postman) will encode "username:password" in Base64.

curl -X GET "http://localhost:8080/api/v1/dags" \
     -H "Content-Type: application/json" \
     --user "admin:your_password"

32
New cards

How would you use the REST API to trigger a DAG run for a specific DAG named daily_data_processing?

You would send a POST request to the dagRuns endpoint for that DAG.

Endpoint: POST http://localhost:8080/api/v1/dags/daily_data_processing/dagRuns

You need to provide a JSON payload in the request body. The simplest one is an empty object, which triggers a DAG Run for the logical "next" execution date.

curl -X POST "http://localhost:8080/api/v1/dags/daily_data_processing/dagRuns" \
     -H "Content-Type: application/json" \
     --user "admin:your_password" \
     -d '{}'

Why: You have an external script that detects a new batch of data is ready. Instead of a person manually clicking the "Trigger DAG" button in the UI, the script automatically calls this API endpoint to start the processing pipeline.

33
New cards

How can you check the status of the latest DAG run for my_etl_pipeline using the REST API?

You would send a GET request to the specific DAG's endpoint to retrieve its details, which include the state of its DAG runs.

Endpoint: GET http://localhost:8080/api/v1/dags/my_etl_pipeline

Practical curl Example:

curl -X GET "http://localhost:8080/api/v1/dags/my_etl_pipeline" \
     -H "Content-Type: application/json" \
     --user "admin:your_password"

In the JSON response, look for the "dag_runs" field. It will be a list where the first item is typically the most recent run, and you can check its "state" (e.g., "success", "running", "failed").

Why: A dashboard application needs to display the real-time status of your key data pipelines. It polls this endpoint every minute to get the latest state and shows a green/red indicator.

34
New cards

How would you pause and unpause a DAG named experimental_dag via the REST API?

Answer:
You use the PATCH endpoint for a specific DAG to update its properties, including is_paused.

Endpoint: PATCH http://localhost:8080/api/v1/dags/experimental_dag

To Pause (send is_paused=true):

curl -X PATCH "http://localhost:8080/api/v1/dags/experimental_dag" \
     -H "Content-Type: application/json" \
     --user "admin:your_password" \
     -d '{"is_paused": true}'

To Unpause (send is_paused=false):

curl -X PATCH "http://localhost:8080/api/v1/dags/experimental_dag" \
     -H "Content-Type: application/json" \
     --user "admin:your_password" \
     -d '{"is_paused": false}'

Why: You have a DAG that processes user behavior data. You want to automatically pause it during scheduled maintenance on your source database to prevent a flood of errors, and then unpause it once maintenance is complete, all from your central orchestration script.

35
New cards

What is a common tool to explore and test the Airflow REST API endpoints interactively on localhost:8080?

Postman or Insomnia are excellent GUI-based tools for this. You can easily set the base URL (http://localhost:8080/api/v1/), configure Basic Authentication with your admin credentials, and then try out different GET, POST, and PATCH requests to various endpoints like /dags, /variables, etc.

Why: Before writing a script that uses the API, you use Postman to manually trigger a DAG to ensure your authentication works and you understand the exact JSON structure the API expects and returns.

36
New cards

What is the overall purpose and workflow of this DAG?

from airflow.decorators import dag, task
from datetime import datetime
import random

@dag(
    dag_id='random_number_checker_taskflow',
    start_date=datetime(2024, 1, 1),
    schedule="@daily",
    description='A simple DAG to generate and check random numbers using TaskFlow API',
    catchup=False,
    tags=['training', 'taskflow']
)
def random_number_checker_taskflow():
    
    @task
    def generate_random_number():
        """Generate a random number between 1 and 100"""
        number = random.randint(1, 100)
        print(f"Generated random number: {number}")
        return number  # Automatically pushed to XCom by TaskFlow API
    
    @task
    def check_even_odd(number):
        """Check if the provided number is even or odd"""
        result = "even" if number % 2 == 0 else "odd"
        print(f"The number {number} is {result}.")
        return result
    
    # Define the workflow using function calls
    random_num = generate_random_number()
    check_result = check_even_odd(random_num)

# Instantiate the DAG
dag = random_number_checker_taskflow()

This DAG creates a simple data pipeline that:

  1. Generates a random number between 1-100

  2. Analyzes whether that number is even or odd

  3. Logs both the generated number and the analysis result

It's a demonstration DAG that shows how to pass data between tasks using Airflow's TaskFlow API.

37
New cards

What is the purpose of the @dag decorator and how does it differ from the traditional DAG definition?

from airflow.decorators import dag, task
from datetime import datetime
import random

@dag(
    dag_id='random_number_checker_taskflow',
    start_date=datetime(2024, 1, 1),
    schedule="@daily",
    description='A simple DAG to generate and check random numbers using TaskFlow API',
    catchup=False,
    tags=['training', 'taskflow']
)
def random_number_checker_taskflow():
    
    @task
    def generate_random_number():
        """Generate a random number between 1 and 100"""
        number = random.randint(1, 100)
        print(f"Generated random number: {number}")
        return number  # Automatically pushed to XCom by TaskFlow API
    
    @task
    def check_even_odd(number):
        """Check if the provided number is even or odd"""
        result = "even" if number % 2 == 0 else "odd"
        print(f"The number {number} is {result}.")
        return result
    
    # Define the workflow using function calls
    random_num = generate_random_number()
    check_result = check_even_odd(random_num)

# Instantiate the DAG
dag = random_number_checker_taskflow()

The @dag decorator transforms the Python function random_number_checker_taskflow() into an Airflow DAG definition. Instead of using with DAG(...) as dag: context manager, the decorator automatically registers the function as a DAG when it's called at the bottom with dag = random_number_checker_taskflow().

38
New cards

How does data flow from generate_random_number to check_even_odd without any explicit XCom code?

from airflow.decorators import dag, task
from datetime import datetime
import random

@dag(
    dag_id='random_number_checker_taskflow',
    start_date=datetime(2024, 1, 1),
    schedule="@daily",
    description='A simple DAG to generate and check random numbers using TaskFlow API',
    catchup=False,
    tags=['training', 'taskflow']
)
def random_number_checker_taskflow():
    
    @task
    def generate_random_number():
        """Generate a random number between 1 and 100"""
        number = random.randint(1, 100)
        print(f"Generated random number: {number}")
        return number  # Automatically pushed to XCom by TaskFlow API
    
    @task
    def check_even_odd(number):
        """Check if the provided number is even or odd"""
        result = "even" if number % 2 == 0 else "odd"
        print(f"The number {number} is {result}.")
        return result
    
    # Define the workflow using function calls
    random_num = generate_random_number()
    check_result = check_even_odd(random_num)

# Instantiate the DAG
dag = random_number_checker_taskflow()

The TaskFlow API handles XCom automatically:

  • When generate_random_number() returns the number, TaskFlow automatically pushes it to XCom behind the scenes

  • When check_even_odd(number) declares a parameter, TaskFlow automatically pulls the value from the upstream task's XCom and passes it as the function argument

  • The dependency random_num = generate_random_number() and check_even_odd(random_num) tells Airflow about the data flow

No need for: ti.xcom_push(), ti.xcom_pull(), or **context parameters!

39
New cards

How are the task dependencies established in this DAG?
What would you see in the Airflow UI when this DAG runs successfully?

from airflow.decorators import dag, task
from datetime import datetime
import random

@dag(
    dag_id='random_number_checker_taskflow',
    start_date=datetime(2024, 1, 1),
    schedule="@daily",
    description='A simple DAG to generate and check random numbers using TaskFlow API',
    catchup=False,
    tags=['training', 'taskflow']
)
def random_number_checker_taskflow():
    
    @task
    def generate_random_number():
        """Generate a random number between 1 and 100"""
        number = random.randint(1, 100)
        print(f"Generated random number: {number}")
        return number  # Automatically pushed to XCom by TaskFlow API
    
    @task
    def check_even_odd(number):
        """Check if the provided number is even or odd"""
        result = "even" if number % 2 == 0 else "odd"
        print(f"The number {number} is {result}.")
        return result
    
    # Define the workflow using function calls
    random_num = generate_random_number()
    check_result = check_even_odd(random_num)

# Instantiate the DAG
dag = random_number_checker_taskflow()

Dependencies are created implicitly through the function calls and variable assignments:

random_num = generate_random_number()        # Task 1
check_result = check_even_odd(random_num)    # Task 2 (depends on Task 1)

Airflow analyzes this code and understands that:

  • check_even_odd requires the output of generate_random_number

  • Therefore, generate_random_number must run before check_even_odd

  • This creates the dependency chain: generate_random_number → check_even_odd

In the Graph View, you'd see:

  • Two task boxes: generate_random_number and check_even_odd

  • An arrow showing the dependency between them

  • Both tasks colored green (success)

In the Logs, you'd see something like:

# generate_random_number task logs:
Generated random number: 47

# check_even_odd task logs:  
The number 47 is odd.