Snowflake Medallion Architecture¶

XO-Data implements the Databricks Medallion Architecture pattern in Snowflake, organizing data into three progressive layers: BRONZE, SILVER, and GOLD.

Overview¶

The medallion architecture organizes data in layers to incrementally improve data quality as it flows through the platform.

┌─────────────────────────────────────────────────────────────┐
│  Source Systems (APIs, Files, Databases)                    │
│  Gladly API │ Sprout Social │ Gmail │ Google Sheets │ S3   │
└───────────────────────┬─────────────────────────────────────┘
                        │ ELT Pipeline (xo-foundry)
                        ▼
┌─────────────────────────────────────────────────────────────┐
│  BRONZE Layer: Raw Data Landing Zone                        │
│  • All VARCHAR columns                                      │
│  • Truncated daily (idempotent)                             │
│  • Six metadata columns for lineage                         │
│  • Managed by: schemachange + Airflow                       │
└───────────────────────┬─────────────────────────────────────┘
                        │ dbt transformations
                        ▼
┌─────────────────────────────────────────────────────────────┐
│  SILVER Layer: Cleaned & Validated Data                     │
│  • Proper data types (INT, TIMESTAMP, etc.)                 │
│  • Historical preservation (no data loss)                   │
│  • No enrichment, no filtering, no aggregation              │
│  • Managed by: dbt                                          │
└───────────────────────┬─────────────────────────────────────┘
                        │ dbt transformations
                        ▼
┌─────────────────────────────────────────────────────────────┐
│  GOLD Layer: Analytics-Ready (4 Types)                      │
│  • Facts: Enriched granular data (fct_)                     │
│  • Dimensions: Current state entities (dim_)                │
│  • Aggregates: Pre-aggregated metrics (agg_)                │
│  • Reports: Consumption views (rpt_)                        │
│  • Managed by: dbt                                          │
└─────────────────────────────────────────────────────────────┘

Layer Comparison¶

Layer	Purpose	Data Quality	Column Types	Loading	Managed By
BRONZE	Raw landing zone	As-is from source	All VARCHAR	`batch_replace` or `truncate_insert`	schemachange + Airflow
SILVER	Historical preservation	Validated, typed	Proper types	dbt incremental	dbt
GOLD	Analytics & reporting	Enriched, aggregated	Business types	dbt table/view	dbt

BRONZE Layer¶

Purpose¶

The BRONZE layer is the landing zone for raw data from source systems. It serves as:

Temporary raw data staging (truncated daily)
Source for Silver layer dbt models
Audit point for data lineage via metadata columns

Characteristics¶

All VARCHAR columns -- No type conversions, preserves raw data
Idempotent loading -- batch_replace or truncate_insert strategy (see ADR 006, ADR 011)
Six metadata columns -- RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE, SOURCE_FILE, BATCH_ID, PIPELINE_RUN_ID
Source system naming -- Column names from source (standardized to UPPERCASE)

Table Naming Convention¶

{SOURCE_SYSTEM}_{OBJECT}

Examples:
- GLADLY_CONTACT_TIMESTAMPS
- GLADLY_CONVERSATION_TIMESTAMPS
- SPROUT_MESSAGES
- BAMBOOHR_EMPLOYEES
- GSHEETS_INBOX_GLOSSARY

Example BRONZE Table¶

CREATE TABLE WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS (
    CONTACT_ID VARCHAR,
    TIMESTAMP_FIELD VARCHAR,
    EVENT_TYPE VARCHAR,
    INITIATOR_ID VARCHAR,
    MESSAGE_ID VARCHAR,
    TARGET_AGENT_ID VARCHAR,
    CREATED_AT VARCHAR,
    -- Standard metadata columns
    RECORD_KEY VARCHAR,
    RECORD_HASH VARCHAR,
    DATE_TO_WAREHOUSE TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
    SOURCE_FILE VARCHAR,
    BATCH_ID VARCHAR,
    PIPELINE_RUN_ID VARCHAR
);

Loading Patterns¶

Bronze tables use one of two idempotent loading strategies.

`batch_replace` (new `snowflake_load` pipelines — preferred)¶

Introduced in ADR 011. Bronze retains history across batches. Used by all Warby Parker Gladly pipelines.

-- Delete only the current batch's rows, then load
DELETE FROM WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
WHERE BATCH_ID = '2026-01-15';

COPY INTO WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
FROM @xo_stage/warbyparker/contact_timestamps/full_refresh/2026-01-15/data.csv
FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1)
FORCE = TRUE
ON_ERROR = 'ABORT_STATEMENT';

Why batch_replace?

Bronze accumulates historical data across batches
dbt --full-refresh on Silver models can correctly rebuild from full history
Idempotent: re-running the same batch produces identical results

`truncate_insert` (legacy `legacy_elt` pipelines)¶

Legacy strategy from ADR 006. Bronze holds only the current batch. Silver is the only historical record.

BEGIN TRANSACTION;
  TRUNCATE TABLE WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS;
  COPY INTO WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
    FROM @xo_stage/warbyparker/...
    FORCE = TRUE
    ON_ERROR = 'ABORT_STATEMENT';
COMMIT;

See ADR 011 for the full decision and migration rationale.

Learn more about ELT pipeline flow →

SILVER Layer¶

Purpose¶

The SILVER layer provides cleaned, validated, and historically preserved data. It serves as:

Single source of truth for historical data
Proper data types and data quality enforcement
Foundation for all Gold layer models

Characteristics¶

Proper data types -- INT, TIMESTAMP, BOOLEAN, etc.
Historical preservation -- No data is lost; Silver keeps the history Bronze discards
No enrichment -- No JOINs with rosters or glossaries (that's Gold's job)
No filtering -- All records preserved, even non-XO agents
No aggregation -- Same grain as source data

Table Naming Convention¶

{OBJECT}

Examples:
- CONTACT_TIMESTAMPS
- CONVERSATION_TIMESTAMPS
- MESSAGES
- WORK_SESSIONS
- INBOX_GLOSSARY

No prefixes in Silver

Silver tables use plain object names. The schema SILVER already indicates the layer. See ADR 009.

Example SILVER Table¶

-- WBP_DB.SILVER.CONTACT_TIMESTAMPS
CREATE TABLE WBP_DB.SILVER.CONTACT_TIMESTAMPS (
    CONTACT_ID VARCHAR NOT NULL,
    TIMESTAMP_FIELD TIMESTAMP_NTZ NOT NULL,
    EVENT_TYPE VARCHAR(50),
    INITIATOR_ID VARCHAR,
    MESSAGE_ID VARCHAR,
    TARGET_AGENT_ID VARCHAR,
    CREATED_AT TIMESTAMP_NTZ NOT NULL,
    _DBT_UPDATED_AT TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);

Transformation Pattern (dbt)¶

-- models/silver/contact_timestamps.sql
{{
    config(
        materialized='incremental',
        unique_key='CONTACT_ID',
        on_schema_change='fail'
    )
}}

SELECT
    -- Type conversions
    CONTACT_ID::VARCHAR AS CONTACT_ID,
    TO_TIMESTAMP(TIMESTAMP_FIELD) AS TIMESTAMP_FIELD,
    UPPER(TRIM(EVENT_TYPE)) AS EVENT_TYPE,
    INITIATOR_ID::VARCHAR AS INITIATOR_ID,
    MESSAGE_ID::VARCHAR AS MESSAGE_ID,
    TARGET_AGENT_ID::VARCHAR AS TARGET_AGENT_ID,
    TO_TIMESTAMP(CREATED_AT) AS CREATED_AT,

    CURRENT_TIMESTAMP() AS _DBT_UPDATED_AT

FROM {{ source('bronze', 'gladly_contact_timestamps') }}

WHERE CONTACT_ID IS NOT NULL

{% if is_incremental() %}
    AND CREATED_AT > (SELECT MAX(CREATED_AT) FROM {{ this }})
{% endif %}

GOLD Layer¶

Purpose¶

The GOLD layer provides analytics-ready data for consumption by BI tools and end users. It implements a four-type architecture per ADR 008.

Four-Type Architecture¶

Type	Prefix	Purpose	Materialization	Storage
Facts	`fct_`	Enriched granular data	Table	Yes
Dimensions	`dim_`	Current state entities	Table	Yes
Aggregates	`agg_`	Pre-aggregated metrics	Table	Yes
Reports	`rpt_`	Consumption views	View	Zero

Facts (`fct_`)¶

Same grain as Silver, enriched with JOINs (rosters, glossaries), filtered to relevant records (e.g., XO agents only).

-- models/gold/fct_contacts.sql
{{
    config(materialized='table')
}}

SELECT
    ct.CONTACT_ID,
    ct.TIMESTAMP_FIELD,
    ct.EVENT_TYPE,
    ct.CREATED_AT,
    r.AGENT_NAME,
    r.TEAM,
    g.INBOX_NAME,
    g.CHANNEL_TYPE
FROM {{ ref('contact_timestamps') }} ct
LEFT JOIN {{ source('core_silver', 'roster_warbyparker') }} r
    ON ct.TARGET_AGENT_ID = r.AGENT_ID
LEFT JOIN {{ ref('inbox_glossary') }} g
    ON ct.INBOX_ID = g.INBOX_ID
WHERE r.AGENT_NAME IS NOT NULL  -- Filter to XO agents

Dimensions (`dim_`)¶

Current state reference entities for downstream joins.

-- models/gold/dim_agents.sql
{{ config(materialized='table') }}

SELECT DISTINCT
    AGENT_ID,
    AGENT_NAME,
    TEAM,
    HIRE_DATE,
    STATUS
FROM {{ source('core_silver', 'roster_warbyparker') }}
WHERE STATUS = 'ACTIVE'

Aggregates (`agg_`)¶

Pre-aggregated metrics -- reusable building blocks for reports.

-- models/gold/agg_agent_daily.sql
{{ config(materialized='table') }}

SELECT
    DATE_TRUNC('day', ct.CREATED_AT) AS DATE,
    ct.TARGET_AGENT_ID AS AGENT_ID,
    r.AGENT_NAME,
    COUNT(DISTINCT ct.CONTACT_ID) AS TOTAL_CONTACTS,
    COUNT(*) AS TOTAL_EVENTS,
    AVG(ct.HANDLE_TIME_MINUTES) AS AVG_HANDLE_TIME
FROM {{ ref('fct_contacts') }} ct
LEFT JOIN {{ ref('dim_agents') }} r ON ct.TARGET_AGENT_ID = r.AGENT_ID
GROUP BY 1, 2, 3

Reports (`rpt_`)¶

End-user consumption views -- Tableau-ready, zero storage cost.

-- models/gold/rpt_agent_dashboard.sql
{{ config(materialized='view') }}

SELECT
    a.DATE,
    a.AGENT_NAME,
    a.TOTAL_CONTACTS,
    a.AVG_HANDLE_TIME,
    d.TEAM,
    d.STATUS
FROM {{ ref('agg_agent_daily') }} a
LEFT JOIN {{ ref('dim_agents') }} d ON a.AGENT_ID = d.AGENT_ID

Database Structure¶

Per-Client Isolation¶

Each client has a dedicated database with three schemas:

WBP_DB (Warby Parker)
├── BRONZE
│   ├── GLADLY_CONTACT_TIMESTAMPS
│   ├── GLADLY_CONVERSATION_TIMESTAMPS
│   ├── GLADLY_WORK_SESSIONS
│   ├── GLADLY_AGENT_DURATIONS
│   └── SPROUT_MESSAGES
├── SILVER
│   ├── CONTACT_TIMESTAMPS
│   ├── CONVERSATION_TIMESTAMPS
│   ├── WORK_SESSIONS
│   └── INBOX_GLOSSARY
└── GOLD
    ├── fct_contacts
    ├── dim_agents
    ├── agg_agent_daily
    └── rpt_agent_dashboard

CND_DB (Conde Nast)
├── BRONZE
│   ├── GLADLY_CONVERSATIONS
│   └── MEDALLIA_CSAT
├── SILVER
│   ├── CONVERSATIONS
│   └── CSAT_SURVEYS
└── GOLD
    ├── fct_conversations
    └── rpt_email_daily

CORE_DB (Shared Infrastructure)
├── BRONZE
│   └── BAMBOOHR_EMPLOYEES
├── SILVER
│   ├── ROSTER_WARBYPARKER      -- Employee rosters (shared resource)
│   ├── ROSTER_CONDENAST
│   └── SHARED_GLOSSARIES
└── GOLD
    └── (cross-client dimensions)

Reference Data Placement (ADR 010)¶

Data Type	Location	Reason
Employee rosters	`CORE_DB.SILVER`	Shared resource, XO-managed
Client glossaries	Client DB (e.g., `WBP_DB.SILVER.INBOX_GLOSSARY`)	Client-specific, maintains cohesion
Shared glossaries	`CORE_DB.SILVER`	Channels, timezones used by all clients

Environment Naming¶

Environment	Database Name	Example
Canonical	`{CLIENT}_DB`	`WBP_DB`
Development	`{CLIENT}_DB_DEV`	`WBP_DB_DEV`

Environment suffixes

The canonical name is {CLIENT}_DB. Development environments add _DEV via YAML environments configuration, not hardcoded into canonical names.

Data Flow Example¶

Warby Parker Gladly Pipeline¶

1. EXTRACT
   Gladly API → S3 Ingest
   s3://xo-ingest/warbyparker/contact_timestamps/2026-01-15/data.csv

2. STAGE
   S3 Ingest → S3 Stage (copy-then-peek, standardize columns)
   s3://xo-stage/warbyparker/contact_timestamps/full_refresh/2026-01-15/data.csv

3. LOAD TO BRONZE (TRUNCATE + FORCE)
   S3 Stage → WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
   (All VARCHAR columns + metadata columns)

4. TRANSFORM TO SILVER
   dbt: BRONZE.GLADLY_CONTACT_TIMESTAMPS → SILVER.CONTACT_TIMESTAMPS
   (Type conversions, validations, historical preservation)

5. TRANSFORM TO GOLD
   dbt: SILVER.CONTACT_TIMESTAMPS → GOLD.fct_contacts → GOLD.agg_agent_daily → GOLD.rpt_agent_dashboard
   (Enrichment, aggregation, consumption views)

Best Practices¶

BRONZE Layer¶

Idempotent loading -- Use batch_replace for new snowflake_load pipelines; truncate_insert for legacy pipelines
Keep all columns as VARCHAR
Include all six metadata columns (RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE, SOURCE_FILE, BATCH_ID, PIPELINE_RUN_ID)
No business logic, no type conversions
Prefer batch_replace to enable safe Silver full-refresh rebuilds

SILVER Layer¶

Enforce proper data types
Add NOT NULL constraints where applicable
Deduplicate on unique keys
Use incremental models for large tables
No enrichment (no JOINs with rosters/glossaries)
No filtering (preserve all records)
No aggregation (same grain as source)

GOLD Layer¶

Use the four-type system (fct_, dim_, agg_, rpt_)
Facts enrich with roster/glossary JOINs and filter to XO agents
Aggregates are reusable building blocks (not tied to one report)
Reports are views (zero storage cost), Tableau-ready
Document business logic thoroughly

Schema Management¶

BRONZE Layer¶

Managed by schemachange in apps/snowflake-schema/:

# Deploy schema changes
cd apps/snowflake-schema
schemachange deploy \
  --config-folder . \
  --vars '{"DB_NAME": "WBP_DB"}'

SILVER/GOLD Layers¶

Managed by dbt:

# Run dbt transformations
dbt run --select silver.*
dbt run --select gold.*

Monitoring & Observability¶

Data Quality Checks¶

-- Row counts by layer
SELECT 'BRONZE' AS LAYER, COUNT(*) FROM WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
UNION ALL
SELECT 'SILVER', COUNT(*) FROM WBP_DB.SILVER.CONTACT_TIMESTAMPS
UNION ALL
SELECT 'GOLD', COUNT(*) FROM WBP_DB.GOLD.fct_contacts;

-- Freshness check
SELECT
    MAX(DATE_TO_WAREHOUSE) AS LATEST_BRONZE_LOAD
FROM WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS;

Next Steps¶

ELT Pipeline Flow -- How data moves through stages
ELT Layer Architecture -- Layer responsibilities in depth
Naming Conventions -- Complete naming standards
Architecture Decisions -- ADRs that shaped this design

Related Documentation:

Snowflake Medallion Architecture¶

Overview¶

Layer Comparison¶

BRONZE Layer¶

Purpose¶

Characteristics¶

Table Naming Convention¶

Example BRONZE Table¶

Loading Patterns¶

batch_replace (new snowflake_load pipelines — preferred)¶

truncate_insert (legacy legacy_elt pipelines)¶

SILVER Layer¶

Purpose¶

Characteristics¶

Table Naming Convention¶

Example SILVER Table¶

Transformation Pattern (dbt)¶

GOLD Layer¶

Purpose¶

Four-Type Architecture¶

Facts (fct_)¶

Dimensions (dim_)¶

Aggregates (agg_)¶

Reports (rpt_)¶

Database Structure¶

Per-Client Isolation¶

Reference Data Placement (ADR 010)¶

Environment Naming¶

Data Flow Example¶

Warby Parker Gladly Pipeline¶

Best Practices¶

BRONZE Layer¶

SILVER Layer¶

GOLD Layer¶

Schema Management¶

BRONZE Layer¶

SILVER/GOLD Layers¶

Monitoring & Observability¶

Data Quality Checks¶

Next Steps¶

`batch_replace` (new `snowflake_load` pipelines — preferred)¶

`truncate_insert` (legacy `legacy_elt` pipelines)¶

Facts (`fct_`)¶

Dimensions (`dim_`)¶

Aggregates (`agg_`)¶

Reports (`rpt_`)¶