ADR 020 — Silver Incremental Merge Strategy¶
Status: Accepted
Date: 2026-04-14
Author: Data Engineering
Context¶
All Silver models that read from batch_replace Bronze tables use materialized='incremental'
with incremental_strategy='append'. The append pattern uses a LEFT JOIN to find RECORD_KEYs
not yet in Silver and inserts only those rows:
{% if is_incremental() %}
new_records AS (
SELECT bronze.*
FROM bronze_source AS bronze
LEFT JOIN {{ this }} AS silver ON bronze.RECORD_KEY = silver.RECORD_KEY
WHERE silver.RECORD_KEY IS NULL -- only new RECORD_KEYs
)
{% endif %}
This works correctly for normal daily operations. However, it has a silent failure mode when a Bronze batch_replace backfill corrects data for a day that is already in Silver.
The Failure Mode¶
Bronze batch_replace allows historical correction of a specific BATCH_ID:
DELETE FROM BRONZE.TABLE WHERE BATCH_ID = '2026-01-13'
COPY INTO BRONZE.TABLE FROM @stage/corrected/2026-01-13/
After this, Bronze contains the corrected row. The corrected row has: - Same RECORD_KEY (derived from natural key columns such as AGENT_ID + date — stable by design) - Different RECORD_HASH (derived from all data columns — changes when values change)
The append strategy's LEFT JOIN finds the existing RECORD_KEY in Silver and filters it out. The correction never reaches Silver. Stale data persists indefinitely with no error or warning.
Why append Was Used Originally¶
The append strategy was chosen for simplicity and because the original Gladly event tables
(contact timestamps, conversation timestamps, agent durations) are truly immutable — a contact
event at a given timestamp does not change. The failure mode was not relevant for those sources.
As new data sources are added with higher correction likelihood (daily aggregates, summaries, external feeds), the limitation becomes material.
Decision¶
All new Silver incremental models must use incremental_strategy='merge' with RECORD_HASH
change detection. This is the canonical pattern going forward.
The Merge Pattern¶
{{ config(
materialized='incremental',
unique_key='RECORD_KEY',
incremental_strategy='merge',
on_schema_change='sync_all_columns',
tags=[...]
) }}
WITH bronze_source AS (
SELECT * FROM {{ source('...', '...') }}
),
{% if is_incremental() %}
upsert_records AS (
SELECT bronze.*
FROM bronze_source AS bronze
LEFT JOIN {{ this }} AS silver ON bronze.RECORD_KEY = silver.RECORD_KEY
WHERE silver.RECORD_KEY IS NULL -- new records → INSERT
OR bronze.RECORD_HASH != silver.RECORD_HASH -- corrected records → UPDATE
)
{% else %}
upsert_records AS (SELECT * FROM bronze_source)
{% endif %}
SELECT ... FROM upsert_records
How It Works¶
| Scenario | RECORD_KEY in Silver | RECORD_HASH match | Result |
|---|---|---|---|
| New record (normal daily load) | No | — | INSERT |
| Unchanged record (re-run) | Yes | Same | Skipped (filtered by WHERE) |
| Corrected record (batch_replace backfill) | Yes | Different | UPDATE (dbt MERGE) |
The WHERE clause pre-filters to only the rows dbt needs to act on, keeping the incremental run efficient. Unchanged records — the vast majority on any given day — never enter the MERGE.
When table (Full Rebuild) Is Preferred¶
Use materialized='table' instead of incremental when:
- The source data changes retroactively on every run (e.g., Google Sheets where any row may be edited between runs)
- The Bronze table is small enough that full rebuild is cheap
- The model has no meaningful history to preserve independently of Bronze
Examples: wbp_gsheet_csat, wbp_gsheet_qa — these rebuild from the latest Bronze BATCH_ID.
When --full-refresh Is Used¶
dbt run --full-refresh --select <model> bypasses the is_incremental() block entirely and
rebuilds Silver from all Bronze history. Use this for:
- Initial load of a new Silver model into an existing table
- Bulk correction after a systemic Bronze data quality incident
- Schema changes that on_schema_change='sync_all_columns' cannot handle automatically
Existing append Models¶
Existing Silver models using the append strategy (wbp_gladly_contact_timestamps,
wbp_gladly_conversation_timestamps, wbp_gladly_agent_durations, wbp_gladly_work_sessions)
are not required to migrate. These models source from truly immutable event streams where
the failure mode does not apply. They may be migrated opportunistically when touched.
Consequences¶
Positive:
- Corrections from Bronze batch_replace backfills automatically propagate to Silver
- No manual intervention required for typical data corrections
- Unchanged records are still skipped — incremental performance is preserved
- --full-refresh remains available as a manual escape hatch
Negative / Trade-offs:
- merge is slightly more compute-intensive than append on Snowflake (MERGE statement vs INSERT)
- In practice the cost difference is negligible for daily-grain summary tables where the
incremental candidate set is small