ELT Layer Architecture¶

This document defines the responsibilities, rules, and patterns for each layer in the XO-Data medallion architecture.

Layer Responsibilities¶

Layer	Owner	Purpose	Enrichment?	Filtering?	Aggregation?
BRONZE	xo-foundry	Raw landing zone	No	No	No
SILVER	dbt	Historical preservation	No	No	No
GOLD Facts	dbt	Enriched granular data	Yes	Yes	No
GOLD Dimensions	dbt	Current state entities	N/A	Optional	No
GOLD Aggregates	dbt	Pre-aggregated metrics	N/A	Yes	Yes
GOLD Reports	dbt	Consumption views	N/A	Optional	Optional

BRONZE: Raw Landing Zone¶

Owner: xo-foundry (Airflow tasks) Tables: schemachange migrations

What BRONZE Does¶

Lands raw data from source systems (all VARCHAR columns)
Adds metadata columns for lineage (RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE, SOURCE_FILE, BATCH_ID, PIPELINE_RUN_ID)
Preserves source column names (standardized to UPPERCASE)

What BRONZE Does NOT Do¶

No type conversions
No business logic
No JOINs
No deduplication (each batch is idempotent via its own loading strategy)
No historical preservation in legacy truncate_insert pipelines (handled in Silver)

Loading Patterns¶

Two strategies are supported (see ADR 011):

batch_replace (preferred for new snowflake_load pipelines):

-- Delete current batch, then load — Bronze retains cross-batch history
DELETE FROM {CLIENT}_DB.BRONZE.{SOURCE}_{OBJECT} WHERE BATCH_ID = '{ds}';
COPY INTO {CLIENT}_DB.BRONZE.{SOURCE}_{OBJECT}
  FROM @stage_path FORCE = TRUE ON_ERROR = 'ABORT_STATEMENT';

truncate_insert (legacy legacy_elt pipelines only):

BEGIN TRANSACTION;
  TRUNCATE TABLE {CLIENT}_DB.BRONZE.{SOURCE}_{OBJECT};
  COPY INTO {CLIENT}_DB.BRONZE.{SOURCE}_{OBJECT}
    FROM @stage_path FORCE = TRUE ON_ERROR = 'ABORT_STATEMENT';
COMMIT;

Naming¶

{SOURCE}_{OBJECT} -- e.g., GLADLY_CONTACT_TIMESTAMPS, SPROUT_MESSAGES

SILVER: Historical Preservation¶

Owner: dbt Materialization: Incremental tables

What SILVER Does¶

Converts VARCHAR to proper data types (INT, TIMESTAMP, etc.)
Validates data quality (NOT NULL, unique keys)
Deduplicates on unique keys via incremental models
Preserves all historical records that Bronze discards

What SILVER Does NOT Do¶

No enrichment -- No JOINs with rosters, glossaries, or other reference data
No filtering -- All records preserved (including non-XO agents)
No aggregation -- Same grain as source data
No business logic -- Only type conversion and data quality

Why This Matters¶

Silver is the single source of truth for historical data. Because Bronze is truncated daily, Silver must preserve everything. Keeping Silver free of enrichment ensures:

Reprocessing safety -- If a roster changes, only Gold needs to be rebuilt
Auditability -- Silver exactly reflects what the source provided
Flexibility -- New Gold models can be built without modifying Silver

Naming¶

{OBJECT} -- e.g., CONTACT_TIMESTAMPS, MESSAGES, INBOX_GLOSSARY

No prefix needed; the schema SILVER indicates the layer.

GOLD: Analytics-Ready (Four Types)¶

Owner: dbt Materialization: Tables (facts, dims, aggs) and Views (reports)

The Gold layer uses a four-type architecture per ADR 008.

Facts (`fct_`)¶

Enriched granular data at the same grain as Silver, enhanced with JOINs to reference data.

Aspect	Detail
Grain	Same as Silver (one row per event/record)
Enrichment	JOINs with rosters, glossaries, dimensions
Filtering	Filter to XO agents, active records, etc.
Aggregation	None (same grain as source)
Materialization	Table

SILVER.CONTACT_TIMESTAMPS
  + JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER (agent names, teams)
  + JOIN SILVER.INBOX_GLOSSARY (inbox names, channels)
  + WHERE agent is XO employee
  = GOLD.fct_contacts

Dimensions (`dim_`)¶

Current state reference entities for downstream JOINs.

Aspect	Detail
Content	SCD Type 1 or Type 2 entities
Source	Rosters, reference tables, lookups
Materialization	Table

Examples: dim_agents, dim_customers, dim_inboxes

Aggregates (`agg_`)¶

Pre-aggregated metrics -- reusable building blocks that multiple reports can reference.

Aspect	Detail
Grain	Aggregated (daily, weekly, by agent, by channel, etc.)
Source	Facts tables
Purpose	Performance, reusability across reports
Materialization	Table
Naming	`agg_{subject}_{grain}`

Examples: agg_agent_daily, agg_channel_weekly, agg_inbox_hourly

Reports (`rpt_`)¶

End-user consumption layer -- Tableau-ready views with zero storage cost.

Aspect	Detail
Content	Final presentation of data for BI tools
Source	Aggregates, facts, dimensions
Materialization	View (zero storage)
Audience	Analysts, Tableau dashboards, stakeholders

Examples: rpt_agent_dashboard, rpt_email_daily, rpt_csat_weekly

Reference Data Placement (ADR 010)¶

Data Type	Location	Reason
Employee rosters	`CORE_DB.SILVER.ROSTER_{CLIENT}`	Shared resource, XO-managed, used by all client Gold models
Client glossaries	Client DB SILVER (e.g., `WBP_DB.SILVER.INBOX_GLOSSARY`)	Client-specific, maintains cohesion within client DB
Shared glossaries	`CORE_DB.SILVER`	Channels, timezones, shared across all clients

Cross-database JOINs

Snowflake handles cross-database JOINs with negligible performance impact. Gold models in WBP_DB.GOLD can freely JOIN with CORE_DB.SILVER.ROSTER_WARBYPARKER.

Data Flow Through Layers¶

Source API
    │
    ▼
BRONZE (raw VARCHAR, truncated daily)
    │  dbt: type conversions, dedup
    ▼
SILVER (typed, historical, no enrichment)
    │  dbt: JOINs with rosters/glossaries, filter to XO agents
    ▼
GOLD Facts (enriched granular data)
    │  dbt: GROUP BY, aggregations
    ▼
GOLD Aggregates (pre-aggregated metrics)
    │  dbt: SELECT + JOIN for presentation
    ▼
GOLD Reports (views, zero storage, Tableau-ready)

Materialization Strategy¶

Object	Materialization	Storage	Rebuild Cost
Bronze tables	Physical tables	Minimal (truncated daily)	Low (re-extract)
Silver models	Incremental tables	Growing (historical)	High (full refresh)
Gold facts	Tables	Moderate	Medium
Gold dimensions	Tables	Small	Low
Gold aggregates	Tables	Moderate	Medium
Gold reports	Views	Zero	None (always current)

Why reports are views

Report views (rpt_) have zero storage cost because they are SQL views on top of aggregates and facts. They are always up-to-date when the underlying tables are refreshed. This also means they can be modified without waiting for a dbt run.

Quick Reference¶

Layer	Naming	Enrichment	Filtering	Aggregation	Materialization
`BRONZE`	`{SOURCE}_{OBJECT}`	No	No	No	Table (truncated)
`SILVER`	`{OBJECT}`	No	No	No	Incremental table
`GOLD.fct_`	`fct_{object}`	Yes (JOINs)	Yes	No	Table
`GOLD.dim_`	`dim_{entity}`	N/A	Optional	No	Table
`GOLD.agg_`	`agg_{subject}_{grain}`	N/A	Yes	Yes	Table
`GOLD.rpt_`	`rpt_{purpose}`	N/A	Optional	Optional	View

Medallion Layers -- Snowflake implementation details
Naming Conventions -- Complete naming standards
ELT Pipeline Flow -- How data moves through stages
Architecture Decisions -- ADRs 006, 008, 009, 010

ELT Layer Architecture¶

Layer Responsibilities¶

BRONZE: Raw Landing Zone¶

What BRONZE Does¶

What BRONZE Does NOT Do¶

Loading Patterns¶

Naming¶

SILVER: Historical Preservation¶

What SILVER Does¶

What SILVER Does NOT Do¶

Why This Matters¶

Naming¶

GOLD: Analytics-Ready (Four Types)¶

Facts (fct_)¶

Dimensions (dim_)¶

Aggregates (agg_)¶

Reports (rpt_)¶

Reference Data Placement (ADR 010)¶

Data Flow Through Layers¶

Materialization Strategy¶

Quick Reference¶

Related Documentation¶

Facts (`fct_`)¶

Dimensions (`dim_`)¶

Aggregates (`agg_`)¶

Reports (`rpt_`)¶