Skip to content

ADR 011: Bronze Batch-Scoped Replace

Status: Accepted Date: 2026-02-17 Supersedes: ADR 006 - Bronze Truncate Force (partially — legacy pipelines still use truncate)

Context

Bronze tables in the new architecture (per-client DBs + dbt) were truncated on every pipeline run (TRUNCATE TABLE + COPY INTO). This meant Bronze only held the latest load's data. If dbt --full-refresh was executed on Silver incremental models, Silver would be rebuilt from Bronze — losing all historical data except today's load.

Decision

Introduce batch_replace as a new Bronze load strategy for snowflake_load pipelines:

DELETE FROM {table} WHERE BATCH_ID = '{ds}'
COPY INTO {table} FROM @stage ...
  • batch_replace: Deletes only the current batch's rows, then loads. Bronze accumulates historical data across batches.
  • truncate_insert: (Legacy default) Truncates the entire table, then loads. Bronze only has current batch.

Implementation

Component Change
dag_config.py Added batch_replace to SnowflakeSourceConfig.load_strategy Literal
snowflake_tasks.py copy_to_snowflake branches on load_strategy: DELETE WHERE BATCH_ID vs TRUNCATE
snowflake_load.py.j2 Passes source-level load_strategy override after **BASE_SNOWFLAKE_CONFIG
warbyparker-gladly-daily.yaml All 4 sources changed to batch_replace
legacy_elt.py.j2 Not modified — continues using truncate_insert from base config

Consequences

Positive

  • dbt --full-refresh on Silver models rebuilds correctly from full Bronze history
  • Idempotent: re-running the same batch date produces identical results
  • Backward compatible: default truncate_insert preserves legacy pipeline behavior

Negative

  • Bronze tables grow over time (mitigated by future retention policy)
  • DELETE is slightly slower than TRUNCATE (mitigated by future CLUSTER BY BATCH_ID)

Future Follow-ups

  • Bronze clustering: ALTER TABLE ... CLUSTER BY (BATCH_ID) for efficient deletes
  • Bronze retention: Snowflake scheduled task to purge records older than 180 days
  • Deprecate truncate_insert for new snowflake_load pipelines