ELT Layer Architecture¶
This document defines the responsibilities, rules, and patterns for each layer in the XO-Data medallion architecture.
Layer Responsibilities¶
| Layer | Owner | Purpose | Enrichment? | Filtering? | Aggregation? |
|---|---|---|---|---|---|
| BRONZE | xo-foundry | Raw landing zone | No | No | No |
| SILVER | dbt | Historical preservation | No | No | No |
| GOLD Facts | dbt | Enriched granular data | Yes | Yes | No |
| GOLD Dimensions | dbt | Current state entities | N/A | Optional | No |
| GOLD Aggregates | dbt | Pre-aggregated metrics | N/A | Yes | Yes |
| GOLD Reports | dbt | Consumption views | N/A | Optional | Optional |
BRONZE: Raw Landing Zone¶
Owner: xo-foundry (Airflow tasks) Tables: schemachange migrations
What BRONZE Does¶
- Lands raw data from source systems (all VARCHAR columns)
- Adds metadata columns for lineage (
RECORD_KEY,RECORD_HASH,DATE_TO_WAREHOUSE,SOURCE_FILE,BATCH_ID,PIPELINE_RUN_ID) - Preserves source column names (standardized to UPPERCASE)
What BRONZE Does NOT Do¶
- No type conversions
- No business logic
- No JOINs
- No deduplication (each batch is idempotent via its own loading strategy)
- No historical preservation in legacy
truncate_insertpipelines (handled in Silver)
Loading Patterns¶
Two strategies are supported (see ADR 011):
batch_replace (preferred for new snowflake_load pipelines):
-- Delete current batch, then load — Bronze retains cross-batch history
DELETE FROM {CLIENT}_DB.BRONZE.{SOURCE}_{OBJECT} WHERE BATCH_ID = '{ds}';
COPY INTO {CLIENT}_DB.BRONZE.{SOURCE}_{OBJECT}
FROM @stage_path FORCE = TRUE ON_ERROR = 'ABORT_STATEMENT';
truncate_insert (legacy legacy_elt pipelines only):
BEGIN TRANSACTION;
TRUNCATE TABLE {CLIENT}_DB.BRONZE.{SOURCE}_{OBJECT};
COPY INTO {CLIENT}_DB.BRONZE.{SOURCE}_{OBJECT}
FROM @stage_path FORCE = TRUE ON_ERROR = 'ABORT_STATEMENT';
COMMIT;
Naming¶
{SOURCE}_{OBJECT} -- e.g., GLADLY_CONTACT_TIMESTAMPS, SPROUT_MESSAGES
SILVER: Historical Preservation¶
Owner: dbt Materialization: Incremental tables
What SILVER Does¶
- Converts VARCHAR to proper data types (INT, TIMESTAMP, etc.)
- Validates data quality (NOT NULL, unique keys)
- Deduplicates on unique keys via incremental models
- Preserves all historical records that Bronze discards
What SILVER Does NOT Do¶
- No enrichment -- No JOINs with rosters, glossaries, or other reference data
- No filtering -- All records preserved (including non-XO agents)
- No aggregation -- Same grain as source data
- No business logic -- Only type conversion and data quality
Why This Matters¶
Silver is the single source of truth for historical data. Because Bronze is truncated daily, Silver must preserve everything. Keeping Silver free of enrichment ensures:
- Reprocessing safety -- If a roster changes, only Gold needs to be rebuilt
- Auditability -- Silver exactly reflects what the source provided
- Flexibility -- New Gold models can be built without modifying Silver
Naming¶
{OBJECT} -- e.g., CONTACT_TIMESTAMPS, MESSAGES, INBOX_GLOSSARY
No prefix needed; the schema SILVER indicates the layer.
GOLD: Analytics-Ready (Four Types)¶
Owner: dbt Materialization: Tables (facts, dims, aggs) and Views (reports)
The Gold layer uses a four-type architecture per ADR 008.
Facts (fct_)¶
Enriched granular data at the same grain as Silver, enhanced with JOINs to reference data.
| Aspect | Detail |
|---|---|
| Grain | Same as Silver (one row per event/record) |
| Enrichment | JOINs with rosters, glossaries, dimensions |
| Filtering | Filter to XO agents, active records, etc. |
| Aggregation | None (same grain as source) |
| Materialization | Table |
SILVER.CONTACT_TIMESTAMPS
+ JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER (agent names, teams)
+ JOIN SILVER.INBOX_GLOSSARY (inbox names, channels)
+ WHERE agent is XO employee
= GOLD.fct_contacts
Dimensions (dim_)¶
Current state reference entities for downstream JOINs.
| Aspect | Detail |
|---|---|
| Content | SCD Type 1 or Type 2 entities |
| Source | Rosters, reference tables, lookups |
| Materialization | Table |
Examples: dim_agents, dim_customers, dim_inboxes
Aggregates (agg_)¶
Pre-aggregated metrics -- reusable building blocks that multiple reports can reference.
| Aspect | Detail |
|---|---|
| Grain | Aggregated (daily, weekly, by agent, by channel, etc.) |
| Source | Facts tables |
| Purpose | Performance, reusability across reports |
| Materialization | Table |
| Naming | agg_{subject}_{grain} |
Examples: agg_agent_daily, agg_channel_weekly, agg_inbox_hourly
Reports (rpt_)¶
End-user consumption layer -- Tableau-ready views with zero storage cost.
| Aspect | Detail |
|---|---|
| Content | Final presentation of data for BI tools |
| Source | Aggregates, facts, dimensions |
| Materialization | View (zero storage) |
| Audience | Analysts, Tableau dashboards, stakeholders |
Examples: rpt_agent_dashboard, rpt_email_daily, rpt_csat_weekly
Reference Data Placement (ADR 010)¶
| Data Type | Location | Reason |
|---|---|---|
| Employee rosters | CORE_DB.SILVER.ROSTER_{CLIENT} |
Shared resource, XO-managed, used by all client Gold models |
| Client glossaries | Client DB SILVER (e.g., WBP_DB.SILVER.INBOX_GLOSSARY) |
Client-specific, maintains cohesion within client DB |
| Shared glossaries | CORE_DB.SILVER |
Channels, timezones, shared across all clients |
Cross-database JOINs
Snowflake handles cross-database JOINs with negligible performance impact. Gold models in WBP_DB.GOLD can freely JOIN with CORE_DB.SILVER.ROSTER_WARBYPARKER.
Data Flow Through Layers¶
Source API
│
▼
BRONZE (raw VARCHAR, truncated daily)
│ dbt: type conversions, dedup
▼
SILVER (typed, historical, no enrichment)
│ dbt: JOINs with rosters/glossaries, filter to XO agents
▼
GOLD Facts (enriched granular data)
│ dbt: GROUP BY, aggregations
▼
GOLD Aggregates (pre-aggregated metrics)
│ dbt: SELECT + JOIN for presentation
▼
GOLD Reports (views, zero storage, Tableau-ready)
Materialization Strategy¶
| Object | Materialization | Storage | Rebuild Cost |
|---|---|---|---|
| Bronze tables | Physical tables | Minimal (truncated daily) | Low (re-extract) |
| Silver models | Incremental tables | Growing (historical) | High (full refresh) |
| Gold facts | Tables | Moderate | Medium |
| Gold dimensions | Tables | Small | Low |
| Gold aggregates | Tables | Moderate | Medium |
| Gold reports | Views | Zero | None (always current) |
Why reports are views
Report views (rpt_) have zero storage cost because they are SQL views on top of aggregates and facts. They are always up-to-date when the underlying tables are refreshed. This also means they can be modified without waiting for a dbt run.
Quick Reference¶
| Layer | Naming | Enrichment | Filtering | Aggregation | Materialization |
|---|---|---|---|---|---|
BRONZE |
{SOURCE}_{OBJECT} |
No | No | No | Table (truncated) |
SILVER |
{OBJECT} |
No | No | No | Incremental table |
GOLD.fct_ |
fct_{object} |
Yes (JOINs) | Yes | No | Table |
GOLD.dim_ |
dim_{entity} |
N/A | Optional | No | Table |
GOLD.agg_ |
agg_{subject}_{grain} |
N/A | Yes | Yes | Table |
GOLD.rpt_ |
rpt_{purpose} |
N/A | Optional | Optional | View |
Related Documentation¶
- Medallion Layers -- Snowflake implementation details
- Naming Conventions -- Complete naming standards
- ELT Pipeline Flow -- How data moves through stages
- Architecture Decisions -- ADRs 006, 008, 009, 010