Skip to content

Snowflake Medallion Architecture

XO-Data implements the Databricks Medallion Architecture pattern in Snowflake, organizing data into three progressive layers: BRONZE, SILVER, and GOLD.

Overview

The medallion architecture organizes data in layers to incrementally improve data quality as it flows through the platform.

┌─────────────────────────────────────────────────────────────┐
│  Source Systems (APIs, Files, Databases)                    │
│  Gladly API │ Sprout Social │ Gmail │ Google Sheets │ S3   │
└───────────────────────┬─────────────────────────────────────┘
                        │ ELT Pipeline (xo-foundry)
┌─────────────────────────────────────────────────────────────┐
│  BRONZE Layer: Raw Data Landing Zone                        │
│  • All VARCHAR columns                                      │
│  • Truncated daily (idempotent)                             │
│  • Six metadata columns for lineage                         │
│  • Managed by: schemachange + Airflow                       │
└───────────────────────┬─────────────────────────────────────┘
                        │ dbt transformations
┌─────────────────────────────────────────────────────────────┐
│  SILVER Layer: Cleaned & Validated Data                     │
│  • Proper data types (INT, TIMESTAMP, etc.)                 │
│  • Historical preservation (no data loss)                   │
│  • No enrichment, no filtering, no aggregation              │
│  • Managed by: dbt                                          │
└───────────────────────┬─────────────────────────────────────┘
                        │ dbt transformations
┌─────────────────────────────────────────────────────────────┐
│  GOLD Layer: Analytics-Ready (4 Types)                      │
│  • Facts: Enriched granular data (fct_)                     │
│  • Dimensions: Current state entities (dim_)                │
│  • Aggregates: Pre-aggregated metrics (agg_)                │
│  • Reports: Consumption views (rpt_)                        │
│  • Managed by: dbt                                          │
└─────────────────────────────────────────────────────────────┘

Layer Comparison

Layer Purpose Data Quality Column Types Loading Managed By
BRONZE Raw landing zone As-is from source All VARCHAR batch_replace or truncate_insert schemachange + Airflow
SILVER Historical preservation Validated, typed Proper types dbt incremental dbt
GOLD Analytics & reporting Enriched, aggregated Business types dbt table/view dbt

BRONZE Layer

Purpose

The BRONZE layer is the landing zone for raw data from source systems. It serves as:

  • Temporary raw data staging (truncated daily)
  • Source for Silver layer dbt models
  • Audit point for data lineage via metadata columns

Characteristics

  • All VARCHAR columns -- No type conversions, preserves raw data
  • Idempotent loading -- batch_replace or truncate_insert strategy (see ADR 006, ADR 011)
  • Six metadata columns -- RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE, SOURCE_FILE, BATCH_ID, PIPELINE_RUN_ID
  • Source system naming -- Column names from source (standardized to UPPERCASE)

Table Naming Convention

{SOURCE_SYSTEM}_{OBJECT}

Examples:
- GLADLY_CONTACT_TIMESTAMPS
- GLADLY_CONVERSATION_TIMESTAMPS
- SPROUT_MESSAGES
- BAMBOOHR_EMPLOYEES
- GSHEETS_INBOX_GLOSSARY

Example BRONZE Table

CREATE TABLE WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS (
    CONTACT_ID VARCHAR,
    TIMESTAMP_FIELD VARCHAR,
    EVENT_TYPE VARCHAR,
    INITIATOR_ID VARCHAR,
    MESSAGE_ID VARCHAR,
    TARGET_AGENT_ID VARCHAR,
    CREATED_AT VARCHAR,
    -- Standard metadata columns
    RECORD_KEY VARCHAR,
    RECORD_HASH VARCHAR,
    DATE_TO_WAREHOUSE TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
    SOURCE_FILE VARCHAR,
    BATCH_ID VARCHAR,
    PIPELINE_RUN_ID VARCHAR
);

Loading Patterns

Bronze tables use one of two idempotent loading strategies.

batch_replace (new snowflake_load pipelines — preferred)

Introduced in ADR 011. Bronze retains history across batches. Used by all Warby Parker Gladly pipelines.

-- Delete only the current batch's rows, then load
DELETE FROM WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
WHERE BATCH_ID = '2026-01-15';

COPY INTO WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
FROM @xo_stage/warbyparker/contact_timestamps/full_refresh/2026-01-15/data.csv
FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1)
FORCE = TRUE
ON_ERROR = 'ABORT_STATEMENT';

Why batch_replace?

  • Bronze accumulates historical data across batches
  • dbt --full-refresh on Silver models can correctly rebuild from full history
  • Idempotent: re-running the same batch produces identical results

truncate_insert (legacy legacy_elt pipelines)

Legacy strategy from ADR 006. Bronze holds only the current batch. Silver is the only historical record.

BEGIN TRANSACTION;
  TRUNCATE TABLE WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS;
  COPY INTO WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
    FROM @xo_stage/warbyparker/...
    FORCE = TRUE
    ON_ERROR = 'ABORT_STATEMENT';
COMMIT;

See ADR 011 for the full decision and migration rationale.

Learn more about ELT pipeline flow →

SILVER Layer

Purpose

The SILVER layer provides cleaned, validated, and historically preserved data. It serves as:

  • Single source of truth for historical data
  • Proper data types and data quality enforcement
  • Foundation for all Gold layer models

Characteristics

  • Proper data types -- INT, TIMESTAMP, BOOLEAN, etc.
  • Historical preservation -- No data is lost; Silver keeps the history Bronze discards
  • No enrichment -- No JOINs with rosters or glossaries (that's Gold's job)
  • No filtering -- All records preserved, even non-XO agents
  • No aggregation -- Same grain as source data

Table Naming Convention

{OBJECT}

Examples:
- CONTACT_TIMESTAMPS
- CONVERSATION_TIMESTAMPS
- MESSAGES
- WORK_SESSIONS
- INBOX_GLOSSARY

No prefixes in Silver

Silver tables use plain object names. The schema SILVER already indicates the layer. See ADR 009.

Example SILVER Table

-- WBP_DB.SILVER.CONTACT_TIMESTAMPS
CREATE TABLE WBP_DB.SILVER.CONTACT_TIMESTAMPS (
    CONTACT_ID VARCHAR NOT NULL,
    TIMESTAMP_FIELD TIMESTAMP_NTZ NOT NULL,
    EVENT_TYPE VARCHAR(50),
    INITIATOR_ID VARCHAR,
    MESSAGE_ID VARCHAR,
    TARGET_AGENT_ID VARCHAR,
    CREATED_AT TIMESTAMP_NTZ NOT NULL,
    _DBT_UPDATED_AT TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);

Transformation Pattern (dbt)

-- models/silver/contact_timestamps.sql
{{
    config(
        materialized='incremental',
        unique_key='CONTACT_ID',
        on_schema_change='fail'
    )
}}

SELECT
    -- Type conversions
    CONTACT_ID::VARCHAR AS CONTACT_ID,
    TO_TIMESTAMP(TIMESTAMP_FIELD) AS TIMESTAMP_FIELD,
    UPPER(TRIM(EVENT_TYPE)) AS EVENT_TYPE,
    INITIATOR_ID::VARCHAR AS INITIATOR_ID,
    MESSAGE_ID::VARCHAR AS MESSAGE_ID,
    TARGET_AGENT_ID::VARCHAR AS TARGET_AGENT_ID,
    TO_TIMESTAMP(CREATED_AT) AS CREATED_AT,

    CURRENT_TIMESTAMP() AS _DBT_UPDATED_AT

FROM {{ source('bronze', 'gladly_contact_timestamps') }}

WHERE CONTACT_ID IS NOT NULL

{% if is_incremental() %}
    AND CREATED_AT > (SELECT MAX(CREATED_AT) FROM {{ this }})
{% endif %}

GOLD Layer

Purpose

The GOLD layer provides analytics-ready data for consumption by BI tools and end users. It implements a four-type architecture per ADR 008.

Four-Type Architecture

Type Prefix Purpose Materialization Storage
Facts fct_ Enriched granular data Table Yes
Dimensions dim_ Current state entities Table Yes
Aggregates agg_ Pre-aggregated metrics Table Yes
Reports rpt_ Consumption views View Zero

Facts (fct_)

Same grain as Silver, enriched with JOINs (rosters, glossaries), filtered to relevant records (e.g., XO agents only).

-- models/gold/fct_contacts.sql
{{
    config(materialized='table')
}}

SELECT
    ct.CONTACT_ID,
    ct.TIMESTAMP_FIELD,
    ct.EVENT_TYPE,
    ct.CREATED_AT,
    r.AGENT_NAME,
    r.TEAM,
    g.INBOX_NAME,
    g.CHANNEL_TYPE
FROM {{ ref('contact_timestamps') }} ct
LEFT JOIN {{ source('core_silver', 'roster_warbyparker') }} r
    ON ct.TARGET_AGENT_ID = r.AGENT_ID
LEFT JOIN {{ ref('inbox_glossary') }} g
    ON ct.INBOX_ID = g.INBOX_ID
WHERE r.AGENT_NAME IS NOT NULL  -- Filter to XO agents

Dimensions (dim_)

Current state reference entities for downstream joins.

-- models/gold/dim_agents.sql
{{ config(materialized='table') }}

SELECT DISTINCT
    AGENT_ID,
    AGENT_NAME,
    TEAM,
    HIRE_DATE,
    STATUS
FROM {{ source('core_silver', 'roster_warbyparker') }}
WHERE STATUS = 'ACTIVE'

Aggregates (agg_)

Pre-aggregated metrics -- reusable building blocks for reports.

-- models/gold/agg_agent_daily.sql
{{ config(materialized='table') }}

SELECT
    DATE_TRUNC('day', ct.CREATED_AT) AS DATE,
    ct.TARGET_AGENT_ID AS AGENT_ID,
    r.AGENT_NAME,
    COUNT(DISTINCT ct.CONTACT_ID) AS TOTAL_CONTACTS,
    COUNT(*) AS TOTAL_EVENTS,
    AVG(ct.HANDLE_TIME_MINUTES) AS AVG_HANDLE_TIME
FROM {{ ref('fct_contacts') }} ct
LEFT JOIN {{ ref('dim_agents') }} r ON ct.TARGET_AGENT_ID = r.AGENT_ID
GROUP BY 1, 2, 3

Reports (rpt_)

End-user consumption views -- Tableau-ready, zero storage cost.

-- models/gold/rpt_agent_dashboard.sql
{{ config(materialized='view') }}

SELECT
    a.DATE,
    a.AGENT_NAME,
    a.TOTAL_CONTACTS,
    a.AVG_HANDLE_TIME,
    d.TEAM,
    d.STATUS
FROM {{ ref('agg_agent_daily') }} a
LEFT JOIN {{ ref('dim_agents') }} d ON a.AGENT_ID = d.AGENT_ID

Database Structure

Per-Client Isolation

Each client has a dedicated database with three schemas:

WBP_DB (Warby Parker)
├── BRONZE
│   ├── GLADLY_CONTACT_TIMESTAMPS
│   ├── GLADLY_CONVERSATION_TIMESTAMPS
│   ├── GLADLY_WORK_SESSIONS
│   ├── GLADLY_AGENT_DURATIONS
│   └── SPROUT_MESSAGES
├── SILVER
│   ├── CONTACT_TIMESTAMPS
│   ├── CONVERSATION_TIMESTAMPS
│   ├── WORK_SESSIONS
│   └── INBOX_GLOSSARY
└── GOLD
    ├── fct_contacts
    ├── dim_agents
    ├── agg_agent_daily
    └── rpt_agent_dashboard

CND_DB (Conde Nast)
├── BRONZE
│   ├── GLADLY_CONVERSATIONS
│   └── MEDALLIA_CSAT
├── SILVER
│   ├── CONVERSATIONS
│   └── CSAT_SURVEYS
└── GOLD
    ├── fct_conversations
    └── rpt_email_daily

CORE_DB (Shared Infrastructure)
├── BRONZE
│   └── BAMBOOHR_EMPLOYEES
├── SILVER
│   ├── ROSTER_WARBYPARKER      -- Employee rosters (shared resource)
│   ├── ROSTER_CONDENAST
│   └── SHARED_GLOSSARIES
└── GOLD
    └── (cross-client dimensions)

Reference Data Placement (ADR 010)

Data Type Location Reason
Employee rosters CORE_DB.SILVER Shared resource, XO-managed
Client glossaries Client DB (e.g., WBP_DB.SILVER.INBOX_GLOSSARY) Client-specific, maintains cohesion
Shared glossaries CORE_DB.SILVER Channels, timezones used by all clients

Environment Naming

Environment Database Name Example
Canonical {CLIENT}_DB WBP_DB
Development {CLIENT}_DB_DEV WBP_DB_DEV

Environment suffixes

The canonical name is {CLIENT}_DB. Development environments add _DEV via YAML environments configuration, not hardcoded into canonical names.

Data Flow Example

Warby Parker Gladly Pipeline

1. EXTRACT
   Gladly API → S3 Ingest
   s3://xo-ingest/warbyparker/contact_timestamps/2026-01-15/data.csv

2. STAGE
   S3 Ingest → S3 Stage (copy-then-peek, standardize columns)
   s3://xo-stage/warbyparker/contact_timestamps/full_refresh/2026-01-15/data.csv

3. LOAD TO BRONZE (TRUNCATE + FORCE)
   S3 Stage → WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
   (All VARCHAR columns + metadata columns)

4. TRANSFORM TO SILVER
   dbt: BRONZE.GLADLY_CONTACT_TIMESTAMPS → SILVER.CONTACT_TIMESTAMPS
   (Type conversions, validations, historical preservation)

5. TRANSFORM TO GOLD
   dbt: SILVER.CONTACT_TIMESTAMPS → GOLD.fct_contacts → GOLD.agg_agent_daily → GOLD.rpt_agent_dashboard
   (Enrichment, aggregation, consumption views)

Best Practices

BRONZE Layer

  • Idempotent loading -- Use batch_replace for new snowflake_load pipelines; truncate_insert for legacy pipelines
  • Keep all columns as VARCHAR
  • Include all six metadata columns (RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE, SOURCE_FILE, BATCH_ID, PIPELINE_RUN_ID)
  • No business logic, no type conversions
  • Prefer batch_replace to enable safe Silver full-refresh rebuilds

SILVER Layer

  • Enforce proper data types
  • Add NOT NULL constraints where applicable
  • Deduplicate on unique keys
  • Use incremental models for large tables
  • No enrichment (no JOINs with rosters/glossaries)
  • No filtering (preserve all records)
  • No aggregation (same grain as source)

GOLD Layer

  • Use the four-type system (fct_, dim_, agg_, rpt_)
  • Facts enrich with roster/glossary JOINs and filter to XO agents
  • Aggregates are reusable building blocks (not tied to one report)
  • Reports are views (zero storage cost), Tableau-ready
  • Document business logic thoroughly

Schema Management

BRONZE Layer

Managed by schemachange in apps/snowflake-schema/:

# Deploy schema changes
cd apps/snowflake-schema
schemachange deploy \
  --config-folder . \
  --vars '{"DB_NAME": "WBP_DB"}'

SILVER/GOLD Layers

Managed by dbt:

# Run dbt transformations
dbt run --select silver.*
dbt run --select gold.*

Monitoring & Observability

Data Quality Checks

-- Row counts by layer
SELECT 'BRONZE' AS LAYER, COUNT(*) FROM WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
UNION ALL
SELECT 'SILVER', COUNT(*) FROM WBP_DB.SILVER.CONTACT_TIMESTAMPS
UNION ALL
SELECT 'GOLD', COUNT(*) FROM WBP_DB.GOLD.fct_contacts;

-- Freshness check
SELECT
    MAX(DATE_TO_WAREHOUSE) AS LATEST_BRONZE_LOAD
FROM WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS;

Next Steps


Related Documentation: