Snowflake Medallion Architecture¶
XO-Data implements the Databricks Medallion Architecture pattern in Snowflake, organizing data into three progressive layers: BRONZE, SILVER, and GOLD.
Overview¶
The medallion architecture organizes data in layers to incrementally improve data quality as it flows through the platform.
┌─────────────────────────────────────────────────────────────┐
│ Source Systems (APIs, Files, Databases) │
│ Gladly API │ Sprout Social │ Gmail │ Google Sheets │ S3 │
└───────────────────────┬─────────────────────────────────────┘
│ ELT Pipeline (xo-foundry)
▼
┌─────────────────────────────────────────────────────────────┐
│ BRONZE Layer: Raw Data Landing Zone │
│ • All VARCHAR columns │
│ • Truncated daily (idempotent) │
│ • Six metadata columns for lineage │
│ • Managed by: schemachange + Airflow │
└───────────────────────┬─────────────────────────────────────┘
│ dbt transformations
▼
┌─────────────────────────────────────────────────────────────┐
│ SILVER Layer: Cleaned & Validated Data │
│ • Proper data types (INT, TIMESTAMP, etc.) │
│ • Historical preservation (no data loss) │
│ • No enrichment, no filtering, no aggregation │
│ • Managed by: dbt │
└───────────────────────┬─────────────────────────────────────┘
│ dbt transformations
▼
┌─────────────────────────────────────────────────────────────┐
│ GOLD Layer: Analytics-Ready (4 Types) │
│ • Facts: Enriched granular data (fct_) │
│ • Dimensions: Current state entities (dim_) │
│ • Aggregates: Pre-aggregated metrics (agg_) │
│ • Reports: Consumption views (rpt_) │
│ • Managed by: dbt │
└─────────────────────────────────────────────────────────────┘
Layer Comparison¶
| Layer | Purpose | Data Quality | Column Types | Loading | Managed By |
|---|---|---|---|---|---|
| BRONZE | Raw landing zone | As-is from source | All VARCHAR | batch_replace or truncate_insert |
schemachange + Airflow |
| SILVER | Historical preservation | Validated, typed | Proper types | dbt incremental | dbt |
| GOLD | Analytics & reporting | Enriched, aggregated | Business types | dbt table/view | dbt |
BRONZE Layer¶
Purpose¶
The BRONZE layer is the landing zone for raw data from source systems. It serves as:
- Temporary raw data staging (truncated daily)
- Source for Silver layer dbt models
- Audit point for data lineage via metadata columns
Characteristics¶
- All VARCHAR columns -- No type conversions, preserves raw data
- Idempotent loading --
batch_replaceortruncate_insertstrategy (see ADR 006, ADR 011) - Six metadata columns --
RECORD_KEY,RECORD_HASH,DATE_TO_WAREHOUSE,SOURCE_FILE,BATCH_ID,PIPELINE_RUN_ID - Source system naming -- Column names from source (standardized to UPPERCASE)
Table Naming Convention¶
{SOURCE_SYSTEM}_{OBJECT}
Examples:
- GLADLY_CONTACT_TIMESTAMPS
- GLADLY_CONVERSATION_TIMESTAMPS
- SPROUT_MESSAGES
- BAMBOOHR_EMPLOYEES
- GSHEETS_INBOX_GLOSSARY
Example BRONZE Table¶
CREATE TABLE WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS (
CONTACT_ID VARCHAR,
TIMESTAMP_FIELD VARCHAR,
EVENT_TYPE VARCHAR,
INITIATOR_ID VARCHAR,
MESSAGE_ID VARCHAR,
TARGET_AGENT_ID VARCHAR,
CREATED_AT VARCHAR,
-- Standard metadata columns
RECORD_KEY VARCHAR,
RECORD_HASH VARCHAR,
DATE_TO_WAREHOUSE TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
SOURCE_FILE VARCHAR,
BATCH_ID VARCHAR,
PIPELINE_RUN_ID VARCHAR
);
Loading Patterns¶
Bronze tables use one of two idempotent loading strategies.
batch_replace (new snowflake_load pipelines — preferred)¶
Introduced in ADR 011. Bronze retains history across batches. Used by all Warby Parker Gladly pipelines.
-- Delete only the current batch's rows, then load
DELETE FROM WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
WHERE BATCH_ID = '2026-01-15';
COPY INTO WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
FROM @xo_stage/warbyparker/contact_timestamps/full_refresh/2026-01-15/data.csv
FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1)
FORCE = TRUE
ON_ERROR = 'ABORT_STATEMENT';
Why batch_replace?
- Bronze accumulates historical data across batches
dbt --full-refreshon Silver models can correctly rebuild from full history- Idempotent: re-running the same batch produces identical results
truncate_insert (legacy legacy_elt pipelines)¶
Legacy strategy from ADR 006. Bronze holds only the current batch. Silver is the only historical record.
BEGIN TRANSACTION;
TRUNCATE TABLE WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS;
COPY INTO WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
FROM @xo_stage/warbyparker/...
FORCE = TRUE
ON_ERROR = 'ABORT_STATEMENT';
COMMIT;
See ADR 011 for the full decision and migration rationale.
Learn more about ELT pipeline flow →
SILVER Layer¶
Purpose¶
The SILVER layer provides cleaned, validated, and historically preserved data. It serves as:
- Single source of truth for historical data
- Proper data types and data quality enforcement
- Foundation for all Gold layer models
Characteristics¶
- Proper data types -- INT, TIMESTAMP, BOOLEAN, etc.
- Historical preservation -- No data is lost; Silver keeps the history Bronze discards
- No enrichment -- No JOINs with rosters or glossaries (that's Gold's job)
- No filtering -- All records preserved, even non-XO agents
- No aggregation -- Same grain as source data
Table Naming Convention¶
{OBJECT}
Examples:
- CONTACT_TIMESTAMPS
- CONVERSATION_TIMESTAMPS
- MESSAGES
- WORK_SESSIONS
- INBOX_GLOSSARY
No prefixes in Silver
Silver tables use plain object names. The schema SILVER already indicates the layer. See ADR 009.
Example SILVER Table¶
-- WBP_DB.SILVER.CONTACT_TIMESTAMPS
CREATE TABLE WBP_DB.SILVER.CONTACT_TIMESTAMPS (
CONTACT_ID VARCHAR NOT NULL,
TIMESTAMP_FIELD TIMESTAMP_NTZ NOT NULL,
EVENT_TYPE VARCHAR(50),
INITIATOR_ID VARCHAR,
MESSAGE_ID VARCHAR,
TARGET_AGENT_ID VARCHAR,
CREATED_AT TIMESTAMP_NTZ NOT NULL,
_DBT_UPDATED_AT TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);
Transformation Pattern (dbt)¶
-- models/silver/contact_timestamps.sql
{{
config(
materialized='incremental',
unique_key='CONTACT_ID',
on_schema_change='fail'
)
}}
SELECT
-- Type conversions
CONTACT_ID::VARCHAR AS CONTACT_ID,
TO_TIMESTAMP(TIMESTAMP_FIELD) AS TIMESTAMP_FIELD,
UPPER(TRIM(EVENT_TYPE)) AS EVENT_TYPE,
INITIATOR_ID::VARCHAR AS INITIATOR_ID,
MESSAGE_ID::VARCHAR AS MESSAGE_ID,
TARGET_AGENT_ID::VARCHAR AS TARGET_AGENT_ID,
TO_TIMESTAMP(CREATED_AT) AS CREATED_AT,
CURRENT_TIMESTAMP() AS _DBT_UPDATED_AT
FROM {{ source('bronze', 'gladly_contact_timestamps') }}
WHERE CONTACT_ID IS NOT NULL
{% if is_incremental() %}
AND CREATED_AT > (SELECT MAX(CREATED_AT) FROM {{ this }})
{% endif %}
GOLD Layer¶
Purpose¶
The GOLD layer provides analytics-ready data for consumption by BI tools and end users. It implements a four-type architecture per ADR 008.
Four-Type Architecture¶
| Type | Prefix | Purpose | Materialization | Storage |
|---|---|---|---|---|
| Facts | fct_ |
Enriched granular data | Table | Yes |
| Dimensions | dim_ |
Current state entities | Table | Yes |
| Aggregates | agg_ |
Pre-aggregated metrics | Table | Yes |
| Reports | rpt_ |
Consumption views | View | Zero |
Facts (fct_)¶
Same grain as Silver, enriched with JOINs (rosters, glossaries), filtered to relevant records (e.g., XO agents only).
-- models/gold/fct_contacts.sql
{{
config(materialized='table')
}}
SELECT
ct.CONTACT_ID,
ct.TIMESTAMP_FIELD,
ct.EVENT_TYPE,
ct.CREATED_AT,
r.AGENT_NAME,
r.TEAM,
g.INBOX_NAME,
g.CHANNEL_TYPE
FROM {{ ref('contact_timestamps') }} ct
LEFT JOIN {{ source('core_silver', 'roster_warbyparker') }} r
ON ct.TARGET_AGENT_ID = r.AGENT_ID
LEFT JOIN {{ ref('inbox_glossary') }} g
ON ct.INBOX_ID = g.INBOX_ID
WHERE r.AGENT_NAME IS NOT NULL -- Filter to XO agents
Dimensions (dim_)¶
Current state reference entities for downstream joins.
-- models/gold/dim_agents.sql
{{ config(materialized='table') }}
SELECT DISTINCT
AGENT_ID,
AGENT_NAME,
TEAM,
HIRE_DATE,
STATUS
FROM {{ source('core_silver', 'roster_warbyparker') }}
WHERE STATUS = 'ACTIVE'
Aggregates (agg_)¶
Pre-aggregated metrics -- reusable building blocks for reports.
-- models/gold/agg_agent_daily.sql
{{ config(materialized='table') }}
SELECT
DATE_TRUNC('day', ct.CREATED_AT) AS DATE,
ct.TARGET_AGENT_ID AS AGENT_ID,
r.AGENT_NAME,
COUNT(DISTINCT ct.CONTACT_ID) AS TOTAL_CONTACTS,
COUNT(*) AS TOTAL_EVENTS,
AVG(ct.HANDLE_TIME_MINUTES) AS AVG_HANDLE_TIME
FROM {{ ref('fct_contacts') }} ct
LEFT JOIN {{ ref('dim_agents') }} r ON ct.TARGET_AGENT_ID = r.AGENT_ID
GROUP BY 1, 2, 3
Reports (rpt_)¶
End-user consumption views -- Tableau-ready, zero storage cost.
-- models/gold/rpt_agent_dashboard.sql
{{ config(materialized='view') }}
SELECT
a.DATE,
a.AGENT_NAME,
a.TOTAL_CONTACTS,
a.AVG_HANDLE_TIME,
d.TEAM,
d.STATUS
FROM {{ ref('agg_agent_daily') }} a
LEFT JOIN {{ ref('dim_agents') }} d ON a.AGENT_ID = d.AGENT_ID
Database Structure¶
Per-Client Isolation¶
Each client has a dedicated database with three schemas:
WBP_DB (Warby Parker)
├── BRONZE
│ ├── GLADLY_CONTACT_TIMESTAMPS
│ ├── GLADLY_CONVERSATION_TIMESTAMPS
│ ├── GLADLY_WORK_SESSIONS
│ ├── GLADLY_AGENT_DURATIONS
│ └── SPROUT_MESSAGES
├── SILVER
│ ├── CONTACT_TIMESTAMPS
│ ├── CONVERSATION_TIMESTAMPS
│ ├── WORK_SESSIONS
│ └── INBOX_GLOSSARY
└── GOLD
├── fct_contacts
├── dim_agents
├── agg_agent_daily
└── rpt_agent_dashboard
CND_DB (Conde Nast)
├── BRONZE
│ ├── GLADLY_CONVERSATIONS
│ └── MEDALLIA_CSAT
├── SILVER
│ ├── CONVERSATIONS
│ └── CSAT_SURVEYS
└── GOLD
├── fct_conversations
└── rpt_email_daily
CORE_DB (Shared Infrastructure)
├── BRONZE
│ └── BAMBOOHR_EMPLOYEES
├── SILVER
│ ├── ROSTER_WARBYPARKER -- Employee rosters (shared resource)
│ ├── ROSTER_CONDENAST
│ └── SHARED_GLOSSARIES
└── GOLD
└── (cross-client dimensions)
Reference Data Placement (ADR 010)¶
| Data Type | Location | Reason |
|---|---|---|
| Employee rosters | CORE_DB.SILVER |
Shared resource, XO-managed |
| Client glossaries | Client DB (e.g., WBP_DB.SILVER.INBOX_GLOSSARY) |
Client-specific, maintains cohesion |
| Shared glossaries | CORE_DB.SILVER |
Channels, timezones used by all clients |
Environment Naming¶
| Environment | Database Name | Example |
|---|---|---|
| Canonical | {CLIENT}_DB |
WBP_DB |
| Development | {CLIENT}_DB_DEV |
WBP_DB_DEV |
Environment suffixes
The canonical name is {CLIENT}_DB. Development environments add _DEV via YAML environments configuration, not hardcoded into canonical names.
Data Flow Example¶
Warby Parker Gladly Pipeline¶
1. EXTRACT
Gladly API → S3 Ingest
s3://xo-ingest/warbyparker/contact_timestamps/2026-01-15/data.csv
2. STAGE
S3 Ingest → S3 Stage (copy-then-peek, standardize columns)
s3://xo-stage/warbyparker/contact_timestamps/full_refresh/2026-01-15/data.csv
3. LOAD TO BRONZE (TRUNCATE + FORCE)
S3 Stage → WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
(All VARCHAR columns + metadata columns)
4. TRANSFORM TO SILVER
dbt: BRONZE.GLADLY_CONTACT_TIMESTAMPS → SILVER.CONTACT_TIMESTAMPS
(Type conversions, validations, historical preservation)
5. TRANSFORM TO GOLD
dbt: SILVER.CONTACT_TIMESTAMPS → GOLD.fct_contacts → GOLD.agg_agent_daily → GOLD.rpt_agent_dashboard
(Enrichment, aggregation, consumption views)
Best Practices¶
BRONZE Layer¶
- Idempotent loading -- Use
batch_replacefor newsnowflake_loadpipelines;truncate_insertfor legacy pipelines - Keep all columns as VARCHAR
- Include all six metadata columns (
RECORD_KEY,RECORD_HASH,DATE_TO_WAREHOUSE,SOURCE_FILE,BATCH_ID,PIPELINE_RUN_ID) - No business logic, no type conversions
- Prefer
batch_replaceto enable safe Silver full-refresh rebuilds
SILVER Layer¶
- Enforce proper data types
- Add NOT NULL constraints where applicable
- Deduplicate on unique keys
- Use incremental models for large tables
- No enrichment (no JOINs with rosters/glossaries)
- No filtering (preserve all records)
- No aggregation (same grain as source)
GOLD Layer¶
- Use the four-type system (fct_, dim_, agg_, rpt_)
- Facts enrich with roster/glossary JOINs and filter to XO agents
- Aggregates are reusable building blocks (not tied to one report)
- Reports are views (zero storage cost), Tableau-ready
- Document business logic thoroughly
Schema Management¶
BRONZE Layer¶
Managed by schemachange in apps/snowflake-schema/:
# Deploy schema changes
cd apps/snowflake-schema
schemachange deploy \
--config-folder . \
--vars '{"DB_NAME": "WBP_DB"}'
SILVER/GOLD Layers¶
Managed by dbt:
Monitoring & Observability¶
Data Quality Checks¶
-- Row counts by layer
SELECT 'BRONZE' AS LAYER, COUNT(*) FROM WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
UNION ALL
SELECT 'SILVER', COUNT(*) FROM WBP_DB.SILVER.CONTACT_TIMESTAMPS
UNION ALL
SELECT 'GOLD', COUNT(*) FROM WBP_DB.GOLD.fct_contacts;
-- Freshness check
SELECT
MAX(DATE_TO_WAREHOUSE) AS LATEST_BRONZE_LOAD
FROM WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS;
Next Steps¶
- ELT Pipeline Flow -- How data moves through stages
- ELT Layer Architecture -- Layer responsibilities in depth
- Naming Conventions -- Complete naming standards
- Architecture Decisions -- ADRs that shaped this design
Related Documentation: