Skip to content

Architecture Summary - XO-Data Platform

Last Updated: 2026-01-05 Status: ✅ Approved Quick Reference: Bronze → Silver → Gold Layer Architecture


Overview

XO-Data is a data engineering platform that orchestrates ELT pipelines from various sources (Gladly API, Google Sheets, Sprout Social) into Snowflake, using Airflow for orchestration and dbt for transformations.


Database Architecture

Per-Client Isolation

Each client has their own Snowflake database with all three layers:

WBP_DB (Warby Parker)          CND_DB (Condé Nast)          CORE_DB (Shared)
├── BRONZE                      ├── BRONZE                   ├── BRONZE
├── SILVER                      ├── SILVER                   ├── SILVER
└── GOLD                        └── GOLD                     └── GOLD

Benefit: Complete data isolation, simpler permissions, self-contained datasets.


Layer Architecture

BRONZE Layer (xo-foundry)

Purpose: Temporary raw landing zone

Characteristics: - Truncated daily (idempotent loads) - All VARCHAR (no type conversion) - Metadata added (RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE) - No transformations - Temporary (24-hour lifespan)

Naming: {SOURCE}_{OBJECT}

Example: WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS


SILVER Layer (dbt)

Purpose: Historical preservation (source of history)

Characteristics: - Permanent append-only or SCD Type 2 - Type conversion (VARCHAR → proper types) - Deduplication (by RECORD_KEY) - NO enrichment (no JOINs) - NO filtering (preserve all records) - NO aggregation (same grain as Bronze)

Naming: {OBJECT} (no prefix)

Example: WBP_DB.SILVER.CONTACT_TIMESTAMPS

Why no enrichment in Silver? - Performance (joins are expensive on large tables) - Maintainability (roster changes don't trigger re-processing) - Separation of concerns (Silver = preservation, Gold = business logic)


GOLD Layer (dbt) - Four Types

1. Facts (fct_ prefix)

Purpose: Enriched granular data (SOT for analysts)

Characteristics: - Same grain as Silver (row per event) - JOIN with employee rosters (CORE_DB) - JOIN with client glossaries - Filter to XO agents only - Materialized as table

Example: WBP_DB.GOLD.FCT_CONTACTS


2. Dimensions (dim_ prefix)

Purpose: Current state entities and reference data

Characteristics: - SCD Type 1 (latest) or Type 2 (historical) - Reference data for joins - Materialized as table

Example: WBP_DB.GOLD.DIM_AGENT


3. Aggregates (agg_ prefix)

Purpose: Pre-aggregated base metrics (building blocks)

Characteristics: - Single grain (agent-day, channel-hour) - Standard KPIs + client-specific KPIs - Reusable across multiple reports - Few or no JOINs - Materialized as table

Example: WBP_DB.GOLD.AGG_AGENT_DAILY


4. Reports (rpt_ prefix)

Purpose: End-user consumption layer (Tableau, analysts)

Characteristics: - JOINs aggregates with dimensions and targets - Calculated fields (variance, performance flags) - Tableau-optimized - Materialized as VIEW (zero storage cost) - Always fresh (computed on query)

Example: WBP_DB.GOLD.RPT_AGENT_DASHBOARD


Data Flow

┌─────────────────────────────────────────────────────────┐
│ xo-foundry (Airflow)                                    │
│ Extract → S3 → BRONZE                                   │
│   - Raw landing (truncated daily)                       │
│   - All VARCHAR, metadata added                         │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ dbt - SILVER Layer                                      │
│   - Historical preservation                             │
│   - Type conversion, deduplication                      │
│   - NO enrichment, NO filtering                         │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ dbt - GOLD FACTS                                        │
│   - Enriched with employee + glossary metadata          │
│   - Filtered to XO agents                               │
│   - SOT for analysts                                    │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ dbt - GOLD AGGREGATES                                   │
│   - Pre-aggregated metrics                              │
│   - Building blocks for reports                         │
│   - Reusable across dashboards                          │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ dbt - GOLD REPORTS (views)                              │
│   - Tableau-ready views                                 │
│   - JOINs with targets, variance calculations           │
│   - Zero storage cost                                   │
└─────────────────────────────────────────────────────────┘

Naming Conventions

Principle: Schema indicates layer, prefix indicates TYPE

Layer Prefix Example Meaning
BRONZE {SOURCE}_ GLADLY_CONTACTS Source system
SILVER None CONTACT_TIMESTAMPS Historical table
GOLD Facts fct_ fct_contacts Enriched facts
GOLD Dims dim_ dim_agent Dimensions
GOLD Aggs agg_ agg_agent_daily Aggregates
GOLD Reports rpt_ rpt_agent_dashboard Reports/views

Why semantic prefixes? - ✅ Not redundant (SILVER.CONTACT_TIMESTAMPS, not SILVER.SILVER_*) - ✅ Self-documenting (fct_contacts = fact table) - ✅ Industry standard (dbt Labs, GitLab, Kimball) - ✅ Consistent length (all 3-4 characters)


Reference Data Placement

Employee RostersCORE_DB.SILVER

CORE_DB.SILVER.ROSTER_WARBYPARKER
CORE_DB.SILVER.ROSTER_WARBYPARKER_HISTORY

Why CORE_DB? - Shared resource (XO manages, not clients) - Centralized maintenance - Source agnostic - No performance impact (cross-DB joins are fast)


Client GlossariesClient Databases

WBP_DB.SILVER.INBOX_GLOSSARY
WBP_DB.SILVER.QUEUE_GLOSSARY
CND_DB.SILVER.DISPOSITION_GLOSSARY

Why client DB? - Client-specific data stays with client - Simpler permissions - Self-contained datasets - No cross-DB joins for glossaries


Performance Strategy

What's Stored (Tables)

  • BRONZE (temporary, truncated daily)
  • SILVER (permanent historical)
  • GOLD Facts (enriched, granular)
  • GOLD Aggregates (pre-calculated metrics)

What's Virtual (Views)

  • GOLD Reports (zero storage cost)

Why This Works

  • Pre-aggregate expensive operations (millions → thousands of rows)
  • Store aggregates (fast base for queries)
  • Reports are views (always fresh, no storage)
  • Views query pre-aggregated data (fast)

Trade-off: Storage is cheap, compute is expensive in Snowflake.


Cross-Database Join Performance

Question: Does joining CORE_DB rosters with client facts hurt performance?

Answer: No - negligible impact (0-1% difference)

Why: - All databases in same account share storage layer - Query optimizer treats cross-DB joins identically - Micro-partition pruning works across databases - No data movement or network latency


Standard KPI Schema

All clients implement standard KPI columns in agg_*_daily tables:

Required Columns (all clients must have): - KPI_DATE, AGENT_ID, AGENT_EMAIL, AGENT_FULL_NAME - CONTACT_COUNT, CONVERSATION_COUNT, UNIQUE_CUSTOMER_COUNT - AVG_HANDLE_TIME, TOTAL_HANDLE_TIME - AVG_HOLD_TIME, TOTAL_HOLD_TIME - AVG_AHT - AVG_CSAT, AVG_QA_SCORE - AVG_FIRST_RESPONSE_TIME

Client-Specific Columns (vary by client): - Warby Parker: CHAT_COUNT, IVR_ENTRY_COUNT, AVG_PAYMENT_AMOUNT - Condé Nast: TOPIC_COUNT, ARTICLE_LINKS_SENT - etc.


Technology Stack

Layer Tool Purpose
Extraction xo-foundry (Airflow) Extract from APIs/Google Sheets
Landing Snowflake BRONZE Temporary raw storage
Transformation dbt SILVER → GOLD transformations
Orchestration Airflow (Astronomer) Schedule and monitor
Consumption Tableau Dashboards (query GOLD views)

Key Architecture Decisions (ADRs)

ADR Topic Status
001 Load strategy terminology ✅ Approved
004 Metadata columns and hashing ✅ Approved
006 Bronze truncate/force strategy ✅ Approved
008 Gold layer architecture ✅ Approved
009 Naming conventions ✅ Approved
010 Reference data placement ✅ Approved

Quick Reference

When to query each layer:

Layer Who Queries Use Case
BRONZE Never (xo-foundry only) Temporary landing
SILVER Power users (rarely) Historical raw data
GOLD Facts Data scientists, analysts Ad-hoc deep dives
GOLD Aggregates Power users Custom aggregations
GOLD Reports Business users, Tableau Standard dashboards

dbt folder structure:

models/
├── silver/
│   └── {client}/
│       ├── contact_timestamps.sql         → SILVER.CONTACT_TIMESTAMPS
│       └── inbox_glossary.sql             → SILVER.INBOX_GLOSSARY
└── gold/
    └── {client}/
        ├── facts/
        │   └── fct_contacts.sql           → GOLD.FCT_CONTACTS
        ├── dimensions/
        │   └── dim_agent.sql              → GOLD.DIM_AGENT
        ├── aggregates/
        │   └── agg_agent_daily.sql        → GOLD.AGG_AGENT_DAILY
        └── reports/
            └── rpt_agent_dashboard.sql    → GOLD.RPT_AGENT_DASHBOARD (view)

Further Reading