Skip to content

ADR 018 — Separate ELT and dbt DAGs via TriggerDagRunOperator

Status

Accepted

Context

As the CORE domain grew to cover all client rosters (26 clients, 58+ dbt models), the ELT DAGs began embedding Cosmos DbtTaskGroup with broad selectors (e.g., select=["tag:core"]). Astronomer Cosmos parses the full dbt project graph at Airflow DAG import time, not at runtime. Selecting a broad tag caused Cosmos to parse all matching models during DagBag load, consistently exceeding Airflow's 30-second DAG import budget:

AirflowTaskTimeout: DagBag import timeout after 30.0s

The DAG could not load at all — no tasks could run.

Why this happens

Cosmos is a compile-time renderer. Every Airflow scheduler cycle that reloads DAGs re-parses the dbt project graph for every DbtTaskGroup in every ELT DAG. The cost scales linearly with the number of models selected. Broad domain-level tags (tag:core) are particularly dangerous because they grow unbounded as new clients onboard.

Secondary problem: incorrect model selection

The BI roster ELT DAG used select: "tag:core_bi_roster_wbp", selecting only WBP models. Adding a new client required remembering to update the ELT DAG's Cosmos selector — a fragile, easy-to-miss step. The correct behavior is: all Bronze loads trigger the same downstream dbt DAG, which always runs all models for that pipeline.

Decision

Separate ELT (Bronze loading) and dbt (Silver+Gold transformations) into independent Airflow DAGs connected by TriggerDagRunOperator.

Pattern

Each Bronze ELT DAG triggers a standalone dbt DAG after all Bronze loads complete:

core_gsheets_ph_hr_roster_daily  →  TriggerDagRunOperator  →  core_dbt_ph_roster
core_gsheets_bi_roster_daily     →  TriggerDagRunOperator  →  core_dbt_bi_roster

The TriggerDagRunOperator is configured with wait_for_completion=True, making the trigger synchronous — the ELT DAG does not complete until the dbt DAG finishes. This preserves the end-to-end success/failure signal needed for alerting and SLA tracking.

The existing skip_dbt DAG param continues to work: when skip_dbt=True, the gate task raises AirflowSkipException, which propagates to trigger_dbt via the default all_success trigger rule, skipping the dbt DAG trigger.

Naming convention

Concept Pattern Example
ELT DAG (how data arrives) {domain}_{source}_{dataset}_{freq} core_gsheets_ph_hr_roster_daily
dbt DAG (what data serves) {domain}_dbt_{concern} core_dbt_ph_roster
dbt pipeline tag {domain}_{concern} core_ph_roster

dbt DAGs are named by data concern, not by source. This supports the many-to-one trigger pattern: multiple ELT DAGs from different sources (e.g., gladly_qa_daily + gsheets_qa_daily) can trigger the same dbt DAG (warbyparker_dbt_qa) when their outputs feed the same models. Currently all pipelines are 1:1; the naming is forward-compatible with many:1.

dbt model tagging

Each model belongs to a pipeline tag that scopes Cosmos model selection to exactly the models needed for that concern:

  • core_ph_roster — 3 models: ph_hr_roster_active (Silver), ph_hr_roster_inactive (Silver), dim_employee_ph (Gold)
  • core_bi_roster — 52 models: 26 Silver bi_roster_* + 26 Gold dim_employee_* (all clients except PH)

Models may carry both a concern tag and a client-specific tag (e.g., core_bi_roster_wbp) for targeted ad-hoc runs. The pipeline tag is what Cosmos uses in the standalone dbt DAG.

View handling

rpt_master_roster is a materialized='view' — once created in Snowflake, it reads live data from the Gold dims without a daily dbt run. It carries only the rpt_master_roster tag and is excluded from both pipeline tags. It only requires dbt run when its SQL definition changes (deploy-time concern, run manually or via CI).

xo-foundry support

The trigger_dag field was added to DBTGlobalConfig in the YAML schema. It is mutually exclusive with select/exclude (validated at config load time). When set, the Jinja2 template emits a TriggerDagRunOperator block instead of a Cosmos DbtTaskGroup. This keeps ELT DAGs free of Cosmos imports entirely, eliminating the import-time parse cost.

Consequences

Easier: - ELT DAGs load instantly — no Cosmos parsing at import time - Adding a new client only requires: new Silver model (with core_bi_roster tag) + new Gold model (same tag). The dbt DAG and YAML need no changes. - dbt DAGs can be triggered independently for debugging or backfills - Clear failure attribution: Bronze failure vs. dbt transformation failure are separate DAG runs - SLA tracking is preserved — TriggerDagRunOperator(wait_for_completion=True) propagates failure

Harder: - More DAGs to manage in the Airflow UI (2 DAGs per pipeline instead of 1) - Triggering a full end-to-end backfill requires clearing both DAGs - dbt DAGs with schedule=None cannot be triggered on a schedule without going through the ELT DAG

Accepted trade-offs: - The visibility gained (clear Bronze vs. dbt separation) outweighs the extra DAG count - skip_dbt=True param provides the same bypass path as before

Options Considered

1. Narrow the Cosmos select tag in each ELT DAG (rejected)

Would fix the current timeout but not the root cause. As models grow, any shared selector risks re-hitting the timeout. Also requires updating the ELT DAG every time a new client is added.

2. Airflow Datasets / data-aware scheduling (deferred)

Airflow Datasets allow dbt DAGs to declare Bronze table dependencies and trigger automatically when upstream datasets are updated. This would eliminate the explicit TriggerDagRunOperator wiring. Deferred because it requires Bronze loaders to emit dataset events and adds complexity. Can be adopted in a future migration once the pattern matures.

3. Increase Airflow DAG import timeout (rejected)

A workaround, not a fix. Timeout exists to prevent scheduler instability. Increasing it would mask the problem and allow it to recur as more models are added.

4. Use ExecutionMode.LOCAL or ExecutionMode.VIRTUALENV in Cosmos (rejected)

Different execution modes change how tasks run, not how the graph is parsed at import time. RenderConfig parsing happens before any execution mode is relevant.

Scope

Applied to CORE pipelines (PH HR roster + BI roster). WBP and CND pipeline DAGs retain embedded Cosmos and are eligible for migration to this pattern in a future PR using the same trigger_dag YAML field.