Architecture Summary - XO-Data Platform¶

Last Updated: 2026-01-05 Status: ✅ Approved Quick Reference: Bronze → Silver → Gold Layer Architecture

Overview¶

XO-Data is a data engineering platform that orchestrates ELT pipelines from various sources (Gladly API, Google Sheets, Sprout Social) into Snowflake, using Airflow for orchestration and dbt for transformations.

Database Architecture¶

Per-Client Isolation¶

Each client has their own Snowflake database with all three layers:

WBP_DB (Warby Parker)          CND_DB (Condé Nast)          CORE_DB (Shared)
├── BRONZE                      ├── BRONZE                   ├── BRONZE
├── SILVER                      ├── SILVER                   ├── SILVER
└── GOLD                        └── GOLD                     └── GOLD

Benefit: Complete data isolation, simpler permissions, self-contained datasets.

Layer Architecture¶

BRONZE Layer (xo-foundry)¶

Purpose: Temporary raw landing zone

Characteristics: - Truncated daily (idempotent loads) - All VARCHAR (no type conversion) - Metadata added (RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE) - No transformations - Temporary (24-hour lifespan)

Naming: {SOURCE}_{OBJECT}

Example: WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS

SILVER Layer (dbt)¶

Purpose: Historical preservation (source of history)

Characteristics: - Permanent append-only or SCD Type 2 - Type conversion (VARCHAR → proper types) - Deduplication (by RECORD_KEY) - NO enrichment (no JOINs) - NO filtering (preserve all records) - NO aggregation (same grain as Bronze)

Naming: {OBJECT} (no prefix)

Example: WBP_DB.SILVER.CONTACT_TIMESTAMPS

Why no enrichment in Silver? - Performance (joins are expensive on large tables) - Maintainability (roster changes don't trigger re-processing) - Separation of concerns (Silver = preservation, Gold = business logic)

GOLD Layer (dbt) - Four Types¶

1. Facts (`fct_` prefix)¶

Purpose: Enriched granular data (SOT for analysts)

Characteristics: - Same grain as Silver (row per event) - JOIN with employee rosters (CORE_DB) - JOIN with client glossaries - Filter to XO agents only - Materialized as table

Example: WBP_DB.GOLD.FCT_CONTACTS

2. Dimensions (`dim_` prefix)¶

Purpose: Current state entities and reference data

Characteristics: - SCD Type 1 (latest) or Type 2 (historical) - Reference data for joins - Materialized as table

Example: WBP_DB.GOLD.DIM_AGENT

3. Aggregates (`agg_` prefix)¶

Purpose: Pre-aggregated base metrics (building blocks)

Characteristics: - Single grain (agent-day, channel-hour) - Standard KPIs + client-specific KPIs - Reusable across multiple reports - Few or no JOINs - Materialized as table

Example: WBP_DB.GOLD.AGG_AGENT_DAILY

4. Reports (`rpt_` prefix)¶

Purpose: End-user consumption layer (Tableau, analysts)

Characteristics: - JOINs aggregates with dimensions and targets - Calculated fields (variance, performance flags) - Tableau-optimized - Materialized as VIEW (zero storage cost) - Always fresh (computed on query)

Example: WBP_DB.GOLD.RPT_AGENT_DASHBOARD

Data Flow¶

┌─────────────────────────────────────────────────────────┐
│ xo-foundry (Airflow)                                    │
│ Extract → S3 → BRONZE                                   │
│   - Raw landing (truncated daily)                       │
│   - All VARCHAR, metadata added                         │
└─────────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────────┐
│ dbt - SILVER Layer                                      │
│   - Historical preservation                             │
│   - Type conversion, deduplication                      │
│   - NO enrichment, NO filtering                         │
└─────────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────────┐
│ dbt - GOLD FACTS                                        │
│   - Enriched with employee + glossary metadata          │
│   - Filtered to XO agents                               │
│   - SOT for analysts                                    │
└─────────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────────┐
│ dbt - GOLD AGGREGATES                                   │
│   - Pre-aggregated metrics                              │
│   - Building blocks for reports                         │
│   - Reusable across dashboards                          │
└─────────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────────┐
│ dbt - GOLD REPORTS (views)                              │
│   - Tableau-ready views                                 │
│   - JOINs with targets, variance calculations           │
│   - Zero storage cost                                   │
└─────────────────────────────────────────────────────────┘

Naming Conventions¶

Principle: Schema indicates layer, prefix indicates TYPE¶

Layer	Prefix	Example	Meaning
BRONZE	`{SOURCE}_`	`GLADLY_CONTACTS`	Source system
SILVER	None	`CONTACT_TIMESTAMPS`	Historical table
GOLD Facts	`fct_`	`fct_contacts`	Enriched facts
GOLD Dims	`dim_`	`dim_agent`	Dimensions
GOLD Aggs	`agg_`	`agg_agent_daily`	Aggregates
GOLD Reports	`rpt_`	`rpt_agent_dashboard`	Reports/views

Why semantic prefixes? - ✅ Not redundant (SILVER.CONTACT_TIMESTAMPS, not SILVER.SILVER_*) - ✅ Self-documenting (fct_contacts = fact table) - ✅ Industry standard (dbt Labs, GitLab, Kimball) - ✅ Consistent length (all 3-4 characters)

Reference Data Placement¶

Employee Rosters → `CORE_DB.SILVER`¶

CORE_DB.SILVER.ROSTER_WARBYPARKER
CORE_DB.SILVER.ROSTER_WARBYPARKER_HISTORY

Why CORE_DB? - Shared resource (XO manages, not clients) - Centralized maintenance - Source agnostic - No performance impact (cross-DB joins are fast)

Client Glossaries → Client Databases¶

WBP_DB.SILVER.INBOX_GLOSSARY
WBP_DB.SILVER.QUEUE_GLOSSARY
CND_DB.SILVER.DISPOSITION_GLOSSARY

Why client DB? - Client-specific data stays with client - Simpler permissions - Self-contained datasets - No cross-DB joins for glossaries

Performance Strategy¶

What's Stored (Tables)¶

BRONZE (temporary, truncated daily)
SILVER (permanent historical)
GOLD Facts (enriched, granular)
GOLD Aggregates (pre-calculated metrics)

What's Virtual (Views)¶

GOLD Reports (zero storage cost)

Why This Works¶

Pre-aggregate expensive operations (millions → thousands of rows)
Store aggregates (fast base for queries)
Reports are views (always fresh, no storage)
Views query pre-aggregated data (fast)

Trade-off: Storage is cheap, compute is expensive in Snowflake.

Cross-Database Join Performance¶

Question: Does joining CORE_DB rosters with client facts hurt performance?

Answer: No - negligible impact (0-1% difference)

Why: - All databases in same account share storage layer - Query optimizer treats cross-DB joins identically - Micro-partition pruning works across databases - No data movement or network latency

Standard KPI Schema¶

All clients implement standard KPI columns in agg_*_daily tables:

Required Columns (all clients must have): - KPI_DATE, AGENT_ID, AGENT_EMAIL, AGENT_FULL_NAME - CONTACT_COUNT, CONVERSATION_COUNT, UNIQUE_CUSTOMER_COUNT - AVG_HANDLE_TIME, TOTAL_HANDLE_TIME - AVG_HOLD_TIME, TOTAL_HOLD_TIME - AVG_AHT - AVG_CSAT, AVG_QA_SCORE - AVG_FIRST_RESPONSE_TIME

Client-Specific Columns (vary by client): - Warby Parker: CHAT_COUNT, IVR_ENTRY_COUNT, AVG_PAYMENT_AMOUNT - Condé Nast: TOPIC_COUNT, ARTICLE_LINKS_SENT - etc.

Technology Stack¶

Layer	Tool	Purpose
Extraction	xo-foundry (Airflow)	Extract from APIs/Google Sheets
Landing	Snowflake BRONZE	Temporary raw storage
Transformation	dbt	SILVER → GOLD transformations
Orchestration	Airflow (Astronomer)	Schedule and monitor
Consumption	Tableau	Dashboards (query GOLD views)

Key Architecture Decisions (ADRs)¶

ADR	Topic	Status
001	Load strategy terminology	✅ Approved
004	Metadata columns and hashing	✅ Approved
006	Bronze truncate/force strategy	✅ Approved
008	Gold layer architecture	✅ Approved
009	Naming conventions	✅ Approved
010	Reference data placement	✅ Approved

Quick Reference¶

When to query each layer:¶

Layer	Who Queries	Use Case
BRONZE	Never (xo-foundry only)	Temporary landing
SILVER	Power users (rarely)	Historical raw data
GOLD Facts	Data scientists, analysts	Ad-hoc deep dives
GOLD Aggregates	Power users	Custom aggregations
GOLD Reports	Business users, Tableau	Standard dashboards

dbt folder structure:¶

models/
├── silver/
│   └── {client}/
│       ├── contact_timestamps.sql         → SILVER.CONTACT_TIMESTAMPS
│       └── inbox_glossary.sql             → SILVER.INBOX_GLOSSARY
│
└── gold/
    └── {client}/
        ├── facts/
        │   └── fct_contacts.sql           → GOLD.FCT_CONTACTS
        ├── dimensions/
        │   └── dim_agent.sql              → GOLD.DIM_AGENT
        ├── aggregates/
        │   └── agg_agent_daily.sql        → GOLD.AGG_AGENT_DAILY
        └── reports/
            └── rpt_agent_dashboard.sql    → GOLD.RPT_AGENT_DASHBOARD (view)