Airflow Trigger Rules

Airflow Trigger Rules

This video builds upon previous discussions regarding task dependencies and relationships in Airflow, including sequential and parallel dependencies, task groups, and chains. The primary focus is on understanding and implementing different trigger rules within Airflow DAGs.

What are Trigger Rules?

Trigger rules define the conditions under which a particular task in Airflow should be executed. They determine when a task is considered ready to run based on the completion status of its upstream (parent) tasks.

  • Purpose: To control task execution flow, especially in scenarios where tasks need to run based on success, failure, or completion of upstream tasks, or even in more complex combinations.

  • Default Behavior (all_success): By default, a task will only be triggered if all of its immediate upstream tasks have completed successfully. If any upstream task fails, is skipped, or remains in an uncompleted state other than success, the downstream task will not execute.

Defining Trigger Rules

Trigger rules are defined as an argument to the BaseOperator or Task instance for the downstream task. For example, to apply a trigger rule to a task named load_data_task, it would be defined as: load_data_task = SomeOperator(..., trigger_rule=TriggerRule.ALL_SUCCESS).

Types of Trigger Rules (with Examples)

Airflow provides several built-in trigger rules to handle various scenarios:

1. all_success (Default)
  • Condition: The task runs only if all its direct upstream tasks succeed.

  • Example (Code): load_data_tasks = BashOperator(..., trigger_rule=TriggerRule.ALL_SUCCESS) (explicitly shown, though not strictly required as it's the default).

  • Demonstration: A DAG with fetch_sales_data and fetch_customer_data upstream to load_to_data_warehouse.

    • If fetch_sales_data succeeds and fetch_customer_data fails, load_to_data_warehouse will not execute and will be marked as skipped or upstream failed, as not all upstream tasks succeeded.

2. all_failed
  • Condition: The task runs only if all its direct upstream tasks have failed.

  • Use Case: Ideal for triggering failure alerts or cleanup processes contingent on complete upstream failure.

  • Example (Code): notify_failure_alert = BashOperator(..., trigger_rule=TriggerRule.ALL_FAILED).

  • Demonstration: A DAG where task_1 and task_2 are upstream to notify_failure_alert.

    • Both task_1 and task_2 are configured to fail.

    • Upon both tasks failing, notify_failure_alert is triggered and executes.

    • The DAG run might then be marked as complete, as the expected failure notification has occurred.

3. all_done
  • Condition: The task runs if all its direct upstream tasks have completed their execution, regardless of their final status (succeeded, failed, skipped).

  • Use Case: Suitable for tasks like archiving logs or final cleanup, which need to run irrespective of upstream task outcomes.

  • Example (Code): archive_log = BashOperator(..., trigger_rule=TriggerRule.ALL_DONE).

  • Demonstration: A DAG with run_quality_check and load_staging_data upstream to archive_log.

    • If run_quality_check succeeds and load_staging_data fails.

    • archive_log will still trigger and execute because both upstream tasks have completed their run, even if one failed. Its purpose is to archive logs, whether they are success or failure logs.

4. always
  • Condition: The task always runs, regardless of the status or even execution of its direct upstream tasks. It is effectively an unconditional trigger.

  • Example (Code): always_execute_task = BashOperator(..., trigger_rule=TriggerRule.ALWAYS).

  • Demonstration: A task with ALWAYS trigger rule will execute even if its upstream tasks are in a queued, running, or failed state, as long as it's its turn in the DAG's flow.

Other Trigger Rules Mentioned (but not explicitly demonstrated in detail):
  • at_least_one_failed: The task runs if at least one direct upstream task has failed.

  • at_least_one_success: The task runs if at least one direct upstream task has succeeded.

  • none_failed: The task runs if none of its direct upstream tasks have failed.

  • none_skipped: The task runs if none of its direct upstream tasks have been skipped.

  • dummy: This was mentioned as a trigger rule, but no specific explanation or example was given.

Complex DAG Example (complex_trigger_rule_dag)

A detailed DAG was presented to demonstrate the interaction of multiple trigger rules and dependencies in a composite scenario.

  • Initial Tasks: start triggers extract_users, extract_orders, extract_inventory.

  • extract_orders Failure: In the demonstration, extract_orders failed.

    • validation_orders: This task has a direct dependency on extract_orders and no explicit trigger rule, so it implicitly uses all_success. Since extract_orders failed, validation_orders did not execute.

    • alert_validation: This task has a TriggerRule.ALL_FAILED condition for its upstream tasks. Since only extract_orders failed (and others succeeded or did not execute in its upstream context), alert_validation was skipped.

  • skip_decision: This task was configured with TriggerRule.ALWAYS_SKIP (or similar construct, indicating it always gets skipped).

  • merge_skip: This task had a TriggerRule.ALL_SKIPPED rule. Its upstream tasks (skip_decision and others possibly set to skip) were skipped, leading to merge_skip's execution.

  • load_failure_path: This task was skipped because it depended on a success status that was not met (due to extract_orders failure).

  • notify_success: This task would execute if at least one success occurred in its upstream dependencies (e.g., extract_users succeeded).

  • notify_failure: This task would execute if at least one failure occurred in its upstream dependencies.

  • cleanup: This task was set to TriggerRule.ALWAYS, ensuring it executed regardless of any upstream success or failure, to perform essential cleanup operations.

This complex example highlights how trigger rules allow for dynamic and resilient data pipelines, enabling different execution paths based on the outcomes of upstream tasks.

Practical Implications

  • Dynamic Data Pipelines: Trigger rules are crucial for building adaptive DAGs that can respond intelligently to varying task outcomes (success, failure, skip).

  • Error Handling and Alerting: They are instrumental in setting up specific alerts for failures (all_failed, at_least_one_failed) or ensuring essential cleanup tasks run irrespective of errors (all_done, always).

  • Production Environments: In real-world production environments with hundreds of DAGs and tasks, a robust understanding and application of trigger rules are vital for creating stable, maintainable, and error-resilient data workflows.

Airflow Setup Notes

The demonstrations were carried out on a local Airflow setup using Docker containers. During the video, a temporary scheduler issue (last heartbeat received 22 minutes ago) caused initial delays in task execution, emphasizing common operational challenges when running Airflow locally.