Skip to content

ADR 011: Bronze Batch-Scoped Replace

Status: Accepted Date: 2026-02-17 Supersedes: ADR 006 - Bronze Truncate Force (partially — legacy pipelines still use truncate)

Context

Bronze tables in the new architecture (per-client DBs + dbt) were truncated on every pipeline run (TRUNCATE TABLE + COPY INTO). This meant Bronze only held the latest load's data. If dbt --full-refresh was executed on Silver incremental models, Silver would be rebuilt from Bronze — losing all historical data except today's load.

Decision

Three Bronze load strategies are supported. All must be declared explicitly in the YAML config — there is no default.

batch_replace:    DELETE FROM {table} WHERE BATCH_ID = '{ds}'; COPY INTO {table} ...
append:           COPY INTO {table} ... (no pre-cleanup)
truncate_insert:  TRUNCATE TABLE {table}; COPY INTO {table} ... (legacy only)
  • batch_replace: Deletes only the current batch's rows, then loads. Bronze accumulates historical data across batches. Standard for new snowflake_load pipelines.
  • append: No pre-COPY cleanup — COPY INTO accumulates rows. Use for pure-accumulation tables (e.g., weekly agent snapshots) where each run should simply add rows.
  • truncate_insert: ⚠️ Legacy — do not use in new pipelines. Truncates the entire table before loading. Bronze only holds the current batch. Valid only in legacy_elt pipelines.

Implementation

Component Change
dag_config.py Added batch_replace to SnowflakeSourceConfig.load_strategy Literal
snowflake_tasks.py copy_to_snowflake branches on load_strategy: DELETE WHERE BATCH_ID vs TRUNCATE
snowflake_load.py.j2 Passes source-level load_strategy override after **BASE_SNOWFLAKE_CONFIG
warbyparker-gladly-daily.yaml All 4 sources changed to batch_replace
legacy_elt.py.j2 Not modified — continues using truncate_insert from base config

Consequences

Positive

  • dbt --full-refresh on Silver models rebuilds correctly from full Bronze history
  • Idempotent: re-running the same batch date produces identical results
  • Backward compatible: default truncate_insert preserves legacy pipeline behavior

Negative

  • Bronze tables grow over time (mitigated by future retention policy)
  • DELETE is slightly slower than TRUNCATE (mitigated by future CLUSTER BY BATCH_ID)

Future Follow-ups

  • Bronze clustering: ALTER TABLE ... CLUSTER BY (BATCH_ID) for efficient deletes
  • Bronze retention: Snowflake scheduled task to purge records older than 180 days
  • Deprecate truncate_insert for new snowflake_load pipelines