Airflow Trigger Rules
Airflow Trigger Rules
This video builds upon previous discussions regarding task dependencies and relationships in Airflow, including sequential and parallel dependencies, task groups, and chains. The primary focus is on understanding and implementing different trigger rules within Airflow DAGs.
What are Trigger Rules?
Trigger rules define the conditions under which a particular task in Airflow should be executed. They determine when a task is considered ready to run based on the completion status of its upstream (parent) tasks.
Purpose: To control task execution flow, especially in scenarios where tasks need to run based on success, failure, or completion of upstream tasks, or even in more complex combinations.
Default Behavior (
all_success): By default, a task will only be triggered if all of its immediate upstream tasks have completed successfully. If any upstream task fails, is skipped, or remains in an uncompleted state other than success, the downstream task will not execute.
Defining Trigger Rules
Trigger rules are defined as an argument to the BaseOperator or Task instance for the downstream task. For example, to apply a trigger rule to a task named load_data_task, it would be defined as: load_data_task = SomeOperator(..., trigger_rule=TriggerRule.ALL_SUCCESS).
Types of Trigger Rules (with Examples)
Airflow provides several built-in trigger rules to handle various scenarios:
1. all_success (Default)
Condition: The task runs only if all its direct upstream tasks succeed.
Example (Code):
load_data_tasks = BashOperator(..., trigger_rule=TriggerRule.ALL_SUCCESS)(explicitly shown, though not strictly required as it's the default).Demonstration: A DAG with
fetch_sales_dataandfetch_customer_dataupstream toload_to_data_warehouse.If
fetch_sales_datasucceeds andfetch_customer_datafails,load_to_data_warehousewill not execute and will be marked as skipped or upstream failed, as not all upstream tasks succeeded.
2. all_failed
Condition: The task runs only if all its direct upstream tasks have failed.
Use Case: Ideal for triggering failure alerts or cleanup processes contingent on complete upstream failure.
Example (Code):
notify_failure_alert = BashOperator(..., trigger_rule=TriggerRule.ALL_FAILED).Demonstration: A DAG where
task_1andtask_2are upstream tonotify_failure_alert.Both
task_1andtask_2are configured to fail.Upon both tasks failing,
notify_failure_alertis triggered and executes.The DAG run might then be marked as complete, as the expected failure notification has occurred.
3. all_done
Condition: The task runs if all its direct upstream tasks have completed their execution, regardless of their final status (succeeded, failed, skipped).
Use Case: Suitable for tasks like archiving logs or final cleanup, which need to run irrespective of upstream task outcomes.
Example (Code):
archive_log = BashOperator(..., trigger_rule=TriggerRule.ALL_DONE).Demonstration: A DAG with
run_quality_checkandload_staging_dataupstream toarchive_log.If
run_quality_checksucceeds andload_staging_datafails.archive_logwill still trigger and execute because both upstream tasks have completed their run, even if one failed. Its purpose is to archive logs, whether they are success or failure logs.
4. always
Condition: The task always runs, regardless of the status or even execution of its direct upstream tasks. It is effectively an unconditional trigger.
Example (Code):
always_execute_task = BashOperator(..., trigger_rule=TriggerRule.ALWAYS).Demonstration: A task with
ALWAYStrigger rule will execute even if its upstream tasks are in a queued, running, or failed state, as long as it's its turn in the DAG's flow.
Other Trigger Rules Mentioned (but not explicitly demonstrated in detail):
at_least_one_failed: The task runs if at least one direct upstream task has failed.at_least_one_success: The task runs if at least one direct upstream task has succeeded.none_failed: The task runs if none of its direct upstream tasks have failed.none_skipped: The task runs if none of its direct upstream tasks have been skipped.dummy: This was mentioned as a trigger rule, but no specific explanation or example was given.
Complex DAG Example (complex_trigger_rule_dag)
A detailed DAG was presented to demonstrate the interaction of multiple trigger rules and dependencies in a composite scenario.
Initial Tasks:
starttriggersextract_users,extract_orders,extract_inventory.extract_ordersFailure: In the demonstration,extract_ordersfailed.validation_orders: This task has a direct dependency onextract_ordersand no explicit trigger rule, so it implicitly usesall_success. Sinceextract_ordersfailed,validation_ordersdid not execute.alert_validation: This task has aTriggerRule.ALL_FAILEDcondition for its upstream tasks. Since onlyextract_ordersfailed (and others succeeded or did not execute in its upstream context),alert_validationwas skipped.
skip_decision: This task was configured withTriggerRule.ALWAYS_SKIP(or similar construct, indicating it always gets skipped).merge_skip: This task had aTriggerRule.ALL_SKIPPEDrule. Its upstream tasks (skip_decisionand others possibly set to skip) were skipped, leading tomerge_skip's execution.load_failure_path: This task was skipped because it depended on a success status that was not met (due toextract_ordersfailure).notify_success: This task would execute if at least one success occurred in its upstream dependencies (e.g.,extract_userssucceeded).notify_failure: This task would execute if at least one failure occurred in its upstream dependencies.cleanup: This task was set toTriggerRule.ALWAYS, ensuring it executed regardless of any upstream success or failure, to perform essential cleanup operations.
This complex example highlights how trigger rules allow for dynamic and resilient data pipelines, enabling different execution paths based on the outcomes of upstream tasks.
Practical Implications
Dynamic Data Pipelines: Trigger rules are crucial for building adaptive DAGs that can respond intelligently to varying task outcomes (success, failure, skip).
Error Handling and Alerting: They are instrumental in setting up specific alerts for failures (
all_failed,at_least_one_failed) or ensuring essential cleanup tasks run irrespective of errors (all_done,always).Production Environments: In real-world production environments with hundreds of DAGs and tasks, a robust understanding and application of trigger rules are vital for creating stable, maintainable, and error-resilient data workflows.
Airflow Setup Notes
The demonstrations were carried out on a local Airflow setup using Docker containers. During the video, a temporary scheduler issue (last heartbeat received minutes ago) caused initial delays in task execution, emphasizing common operational challenges when running Airflow locally.