Skip to content

ELT Layer Architecture

This document defines the responsibilities, rules, and patterns for each layer in the XO-Data medallion architecture.

Layer Responsibilities

Layer Owner Purpose Enrichment? Filtering? Aggregation?
BRONZE xo-foundry Raw landing zone No No No
SILVER dbt Historical preservation No No No
GOLD Facts dbt Enriched granular data Yes Yes No
GOLD Dimensions dbt Current state entities N/A Optional No
GOLD Aggregates dbt Pre-aggregated metrics N/A Yes Yes
GOLD Reports dbt Consumption views N/A Optional Optional

BRONZE: Raw Landing Zone

Owner: xo-foundry (Airflow tasks) Tables: schemachange migrations

What BRONZE Does

  • Lands raw data from source systems (all VARCHAR columns)
  • Adds metadata columns for lineage (RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE, SOURCE_FILE, BATCH_ID, PIPELINE_RUN_ID)
  • Preserves source column names (standardized to UPPERCASE)

What BRONZE Does NOT Do

  • No type conversions
  • No business logic
  • No JOINs
  • No deduplication (each batch is idempotent via its own loading strategy)
  • No historical preservation in legacy truncate_insert pipelines (handled in Silver)

Loading Patterns

Two strategies are supported (see ADR 011):

batch_replace (preferred for new snowflake_load pipelines):

-- Delete current batch, then load — Bronze retains cross-batch history
DELETE FROM {CLIENT}_DB.BRONZE.{SOURCE}_{OBJECT} WHERE BATCH_ID = '{ds}';
COPY INTO {CLIENT}_DB.BRONZE.{SOURCE}_{OBJECT}
  FROM @stage_path FORCE = TRUE ON_ERROR = 'ABORT_STATEMENT';

truncate_insert (legacy legacy_elt pipelines only):

BEGIN TRANSACTION;
  TRUNCATE TABLE {CLIENT}_DB.BRONZE.{SOURCE}_{OBJECT};
  COPY INTO {CLIENT}_DB.BRONZE.{SOURCE}_{OBJECT}
    FROM @stage_path FORCE = TRUE ON_ERROR = 'ABORT_STATEMENT';
COMMIT;

Naming

{SOURCE}_{OBJECT} -- e.g., GLADLY_CONTACT_TIMESTAMPS, SPROUT_MESSAGES

SILVER: Historical Preservation

Owner: dbt Materialization: Incremental tables

What SILVER Does

  • Converts VARCHAR to proper data types (INT, TIMESTAMP, etc.)
  • Validates data quality (NOT NULL, unique keys)
  • Deduplicates on unique keys via incremental models
  • Preserves all historical records that Bronze discards

What SILVER Does NOT Do

  • No enrichment -- No JOINs with rosters, glossaries, or other reference data
  • No filtering -- All records preserved (including non-XO agents)
  • No aggregation -- Same grain as source data
  • No business logic -- Only type conversion and data quality

Why This Matters

Silver is the single source of truth for historical data. Because Bronze is truncated daily, Silver must preserve everything. Keeping Silver free of enrichment ensures:

  1. Reprocessing safety -- If a roster changes, only Gold needs to be rebuilt
  2. Auditability -- Silver exactly reflects what the source provided
  3. Flexibility -- New Gold models can be built without modifying Silver

Naming

{OBJECT} -- e.g., CONTACT_TIMESTAMPS, MESSAGES, INBOX_GLOSSARY

No prefix needed; the schema SILVER indicates the layer.

GOLD: Analytics-Ready (Four Types)

Owner: dbt Materialization: Tables (facts, dims, aggs) and Views (reports)

The Gold layer uses a four-type architecture per ADR 008.

Facts (fct_)

Enriched granular data at the same grain as Silver, enhanced with JOINs to reference data.

Aspect Detail
Grain Same as Silver (one row per event/record)
Enrichment JOINs with rosters, glossaries, dimensions
Filtering Filter to XO agents, active records, etc.
Aggregation None (same grain as source)
Materialization Table
SILVER.CONTACT_TIMESTAMPS
  + JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER (agent names, teams)
  + JOIN SILVER.INBOX_GLOSSARY (inbox names, channels)
  + WHERE agent is XO employee
  = GOLD.fct_contacts

Dimensions (dim_)

Current state reference entities for downstream JOINs.

Aspect Detail
Content SCD Type 1 or Type 2 entities
Source Rosters, reference tables, lookups
Materialization Table

Examples: dim_agents, dim_customers, dim_inboxes

Aggregates (agg_)

Pre-aggregated metrics -- reusable building blocks that multiple reports can reference.

Aspect Detail
Grain Aggregated (daily, weekly, by agent, by channel, etc.)
Source Facts tables
Purpose Performance, reusability across reports
Materialization Table
Naming agg_{subject}_{grain}

Examples: agg_agent_daily, agg_channel_weekly, agg_inbox_hourly

Reports (rpt_)

End-user consumption layer -- Tableau-ready views with zero storage cost.

Aspect Detail
Content Final presentation of data for BI tools
Source Aggregates, facts, dimensions
Materialization View (zero storage)
Audience Analysts, Tableau dashboards, stakeholders

Examples: rpt_agent_dashboard, rpt_email_daily, rpt_csat_weekly

Reference Data Placement (ADR 010)

Data Type Location Reason
Employee rosters CORE_DB.SILVER.ROSTER_{CLIENT} Shared resource, XO-managed, used by all client Gold models
Client glossaries Client DB SILVER (e.g., WBP_DB.SILVER.INBOX_GLOSSARY) Client-specific, maintains cohesion within client DB
Shared glossaries CORE_DB.SILVER Channels, timezones, shared across all clients

Cross-database JOINs

Snowflake handles cross-database JOINs with negligible performance impact. Gold models in WBP_DB.GOLD can freely JOIN with CORE_DB.SILVER.ROSTER_WARBYPARKER.

Data Flow Through Layers

Source API
BRONZE (raw VARCHAR, truncated daily)
    │  dbt: type conversions, dedup
SILVER (typed, historical, no enrichment)
    │  dbt: JOINs with rosters/glossaries, filter to XO agents
GOLD Facts (enriched granular data)
    │  dbt: GROUP BY, aggregations
GOLD Aggregates (pre-aggregated metrics)
    │  dbt: SELECT + JOIN for presentation
GOLD Reports (views, zero storage, Tableau-ready)

Materialization Strategy

Object Materialization Storage Rebuild Cost
Bronze tables Physical tables Minimal (truncated daily) Low (re-extract)
Silver models Incremental tables Growing (historical) High (full refresh)
Gold facts Tables Moderate Medium
Gold dimensions Tables Small Low
Gold aggregates Tables Moderate Medium
Gold reports Views Zero None (always current)

Why reports are views

Report views (rpt_) have zero storage cost because they are SQL views on top of aggregates and facts. They are always up-to-date when the underlying tables are refreshed. This also means they can be modified without waiting for a dbt run.

Quick Reference

Layer Naming Enrichment Filtering Aggregation Materialization
BRONZE {SOURCE}_{OBJECT} No No No Table (truncated)
SILVER {OBJECT} No No No Incremental table
GOLD.fct_ fct_{object} Yes (JOINs) Yes No Table
GOLD.dim_ dim_{entity} N/A Optional No Table
GOLD.agg_ agg_{subject}_{grain} N/A Yes Yes Table
GOLD.rpt_ rpt_{purpose} N/A Optional Optional View