ADR 020 — Silver Incremental Merge Strategy¶

Status: Accepted
Date: 2026-04-14
Author: Data Engineering

Context¶

All Silver models that read from batch_replace Bronze tables use materialized='incremental' with incremental_strategy='append'. The append pattern uses a LEFT JOIN to find RECORD_KEYs not yet in Silver and inserts only those rows:

{% if is_incremental() %}
new_records AS (
    SELECT bronze.*
    FROM bronze_source AS bronze
    LEFT JOIN {{ this }} AS silver ON bronze.RECORD_KEY = silver.RECORD_KEY
    WHERE silver.RECORD_KEY IS NULL   -- only new RECORD_KEYs
)
{% endif %}

This works correctly for normal daily operations. However, it has a silent failure mode when a Bronze batch_replace backfill corrects data for a day that is already in Silver.

The Failure Mode¶

Bronze batch_replace allows historical correction of a specific BATCH_ID:

DELETE FROM BRONZE.TABLE WHERE BATCH_ID = '2026-01-13'
COPY INTO BRONZE.TABLE FROM @stage/corrected/2026-01-13/

After this, Bronze contains the corrected row. The corrected row has: - Same RECORD_KEY (derived from natural key columns such as AGENT_ID + date — stable by design) - Different RECORD_HASH (derived from all data columns — changes when values change)

The append strategy's LEFT JOIN finds the existing RECORD_KEY in Silver and filters it out. The correction never reaches Silver. Stale data persists indefinitely with no error or warning.

Why `append` Was Used Originally¶

The append strategy was chosen for simplicity and because the original Gladly event tables (contact timestamps, conversation timestamps, agent durations) are truly immutable — a contact event at a given timestamp does not change. The failure mode was not relevant for those sources.

As new data sources are added with higher correction likelihood (daily aggregates, summaries, external feeds), the limitation becomes material.

Decision¶

All new Silver incremental models must use incremental_strategy='merge' with RECORD_HASH change detection. This is the canonical pattern going forward.

The Merge Pattern¶

{{ config(
    materialized='incremental',
    unique_key='RECORD_KEY',
    incremental_strategy='merge',
    on_schema_change='sync_all_columns',
    tags=[...]
) }}

WITH bronze_source AS (
    SELECT * FROM {{ source('...', '...') }}
),

{% if is_incremental() %}
upsert_records AS (
    SELECT bronze.*
    FROM bronze_source AS bronze
    LEFT JOIN {{ this }} AS silver ON bronze.RECORD_KEY = silver.RECORD_KEY
    WHERE silver.RECORD_KEY IS NULL               -- new records → INSERT
       OR bronze.RECORD_HASH != silver.RECORD_HASH  -- corrected records → UPDATE
)
{% else %}
upsert_records AS (SELECT * FROM bronze_source)
{% endif %}

SELECT ... FROM upsert_records

How It Works¶

Scenario	RECORD_KEY in Silver	RECORD_HASH match	Result
New record (normal daily load)	No	—	INSERT
Unchanged record (re-run)	Yes	Same	Skipped (filtered by WHERE)
Corrected record (batch_replace backfill)	Yes	Different	UPDATE (dbt MERGE)

The WHERE clause pre-filters to only the rows dbt needs to act on, keeping the incremental run efficient. Unchanged records — the vast majority on any given day — never enter the MERGE.

When `table` (Full Rebuild) Is Preferred¶

Use materialized='table' instead of incremental when: - The source data changes retroactively on every run (e.g., Google Sheets where any row may be edited between runs) - The Bronze table is small enough that full rebuild is cheap - The model has no meaningful history to preserve independently of Bronze

Examples: wbp_gsheet_csat, wbp_gsheet_qa — these rebuild from the latest Bronze BATCH_ID.

When `--full-refresh` Is Used¶

dbt run --full-refresh --select <model> bypasses the is_incremental() block entirely and rebuilds Silver from all Bronze history. Use this for: - Initial load of a new Silver model into an existing table - Bulk correction after a systemic Bronze data quality incident - Schema changes that on_schema_change='sync_all_columns' cannot handle automatically

Existing `append` Models¶

Existing Silver models using the append strategy (wbp_gladly_contact_timestamps, wbp_gladly_conversation_timestamps, wbp_gladly_agent_durations, wbp_gladly_work_sessions) are not required to migrate. These models source from truly immutable event streams where the failure mode does not apply. They may be migrated opportunistically when touched.

Consequences¶

Positive: - Corrections from Bronze batch_replace backfills automatically propagate to Silver - No manual intervention required for typical data corrections - Unchanged records are still skipped — incremental performance is preserved - --full-refresh remains available as a manual escape hatch

Negative / Trade-offs: - merge is slightly more compute-intensive than append on Snowflake (MERGE statement vs INSERT) - In practice the cost difference is negligible for daily-grain summary tables where the incremental candidate set is small

References¶

ADR 004 — Defines RECORD_KEY and RECORD_HASH. RECORD_KEY is a stable natural-key-based identifier; RECORD_HASH is MD5 of all data columns for change detection.
ADR 011 — Defines the batch_replace Bronze loading pattern that enables historical corrections.