ADR 018 — Separate ELT and dbt DAGs via TriggerDagRunOperator¶
Status¶
Accepted
Context¶
As the CORE domain grew to cover all client rosters (26 clients, 58+ dbt models), the ELT DAGs began
embedding Cosmos DbtTaskGroup with broad selectors (e.g., select=["tag:core"]). Astronomer Cosmos
parses the full dbt project graph at Airflow DAG import time, not at runtime. Selecting a broad
tag caused Cosmos to parse all matching models during DagBag load, consistently exceeding Airflow's
30-second DAG import budget:
The DAG could not load at all — no tasks could run.
Why this happens¶
Cosmos is a compile-time renderer. Every Airflow scheduler cycle that reloads DAGs re-parses the dbt
project graph for every DbtTaskGroup in every ELT DAG. The cost scales linearly with the number of
models selected. Broad domain-level tags (tag:core) are particularly dangerous because they grow
unbounded as new clients onboard.
Secondary problem: incorrect model selection¶
The BI roster ELT DAG used select: "tag:core_bi_roster_wbp", selecting only WBP models. Adding
a new client required remembering to update the ELT DAG's Cosmos selector — a fragile, easy-to-miss
step. The correct behavior is: all Bronze loads trigger the same downstream dbt DAG, which
always runs all models for that pipeline.
Decision¶
Separate ELT (Bronze loading) and dbt (Silver+Gold transformations) into independent Airflow DAGs
connected by TriggerDagRunOperator.
Pattern¶
Each Bronze ELT DAG triggers a standalone dbt DAG after all Bronze loads complete:
core_gsheets_ph_hr_roster_daily → TriggerDagRunOperator → core_dbt_ph_roster
core_gsheets_bi_roster_daily → TriggerDagRunOperator → core_dbt_bi_roster
The TriggerDagRunOperator is configured with wait_for_completion=True, making the trigger
synchronous — the ELT DAG does not complete until the dbt DAG finishes. This preserves the
end-to-end success/failure signal needed for alerting and SLA tracking.
The existing skip_dbt DAG param continues to work: when skip_dbt=True, the gate task raises
AirflowSkipException, which propagates to trigger_dbt via the default all_success trigger
rule, skipping the dbt DAG trigger.
Naming convention¶
| Concept | Pattern | Example |
|---|---|---|
| ELT DAG (how data arrives) | {domain}_{source}_{dataset}_{freq} |
core_gsheets_ph_hr_roster_daily |
| dbt DAG (what data serves) | {domain}_dbt_{concern} |
core_dbt_ph_roster |
| dbt pipeline tag | {domain}_{concern} |
core_ph_roster |
dbt DAGs are named by data concern, not by source. This supports the many-to-one trigger
pattern: multiple ELT DAGs from different sources (e.g., gladly_qa_daily + gsheets_qa_daily)
can trigger the same dbt DAG (warbyparker_dbt_qa) when their outputs feed the same models.
Currently all pipelines are 1:1; the naming is forward-compatible with many:1.
dbt model tagging¶
Each model belongs to a pipeline tag that scopes Cosmos model selection to exactly the models needed for that concern:
core_ph_roster— 3 models:ph_hr_roster_active(Silver),ph_hr_roster_inactive(Silver),dim_employee_ph(Gold)core_bi_roster— 52 models: 26 Silverbi_roster_*+ 26 Golddim_employee_*(all clients except PH)
Models may carry both a concern tag and a client-specific tag (e.g., core_bi_roster_wbp) for
targeted ad-hoc runs. The pipeline tag is what Cosmos uses in the standalone dbt DAG.
View handling¶
rpt_master_roster is a materialized='view' — once created in Snowflake, it reads live data
from the Gold dims without a daily dbt run. It carries only the rpt_master_roster tag and is
excluded from both pipeline tags. It only requires dbt run when its SQL definition changes
(deploy-time concern, run manually or via CI).
xo-foundry support¶
The trigger_dag field was added to DBTGlobalConfig in the YAML schema. It is mutually exclusive
with select/exclude (validated at config load time). When set, the Jinja2 template emits a
TriggerDagRunOperator block instead of a Cosmos DbtTaskGroup. This keeps ELT DAGs free of
Cosmos imports entirely, eliminating the import-time parse cost.
Consequences¶
Easier:
- ELT DAGs load instantly — no Cosmos parsing at import time
- Adding a new client only requires: new Silver model (with core_bi_roster tag) + new Gold model
(same tag). The dbt DAG and YAML need no changes.
- dbt DAGs can be triggered independently for debugging or backfills
- Clear failure attribution: Bronze failure vs. dbt transformation failure are separate DAG runs
- SLA tracking is preserved — TriggerDagRunOperator(wait_for_completion=True) propagates failure
Harder:
- More DAGs to manage in the Airflow UI (2 DAGs per pipeline instead of 1)
- Triggering a full end-to-end backfill requires clearing both DAGs
- dbt DAGs with schedule=None cannot be triggered on a schedule without going through the ELT DAG
Accepted trade-offs:
- The visibility gained (clear Bronze vs. dbt separation) outweighs the extra DAG count
- skip_dbt=True param provides the same bypass path as before
Options Considered¶
1. Narrow the Cosmos select tag in each ELT DAG (rejected)¶
Would fix the current timeout but not the root cause. As models grow, any shared selector risks re-hitting the timeout. Also requires updating the ELT DAG every time a new client is added.
2. Airflow Datasets / data-aware scheduling (deferred)¶
Airflow Datasets allow dbt DAGs to declare Bronze table dependencies and trigger automatically when
upstream datasets are updated. This would eliminate the explicit TriggerDagRunOperator wiring.
Deferred because it requires Bronze loaders to emit dataset events and adds complexity. Can be
adopted in a future migration once the pattern matures.
3. Increase Airflow DAG import timeout (rejected)¶
A workaround, not a fix. Timeout exists to prevent scheduler instability. Increasing it would mask the problem and allow it to recur as more models are added.
4. Use ExecutionMode.LOCAL or ExecutionMode.VIRTUALENV in Cosmos (rejected)¶
Different execution modes change how tasks run, not how the graph is parsed at import time.
RenderConfig parsing happens before any execution mode is relevant.
Scope¶
Applied to CORE pipelines (PH HR roster + BI roster). WBP and CND pipeline DAGs retain embedded
Cosmos and are eligible for migration to this pattern in a future PR using the same trigger_dag
YAML field.