Architecture Summary - XO-Data Platform¶
Last Updated: 2026-01-05 Status: ✅ Approved Quick Reference: Bronze → Silver → Gold Layer Architecture
Overview¶
XO-Data is a data engineering platform that orchestrates ELT pipelines from various sources (Gladly API, Google Sheets, Sprout Social) into Snowflake, using Airflow for orchestration and dbt for transformations.
Database Architecture¶
Per-Client Isolation¶
Each client has their own Snowflake database with all three layers:
WBP_DB (Warby Parker) CND_DB (Condé Nast) CORE_DB (Shared)
├── BRONZE ├── BRONZE ├── BRONZE
├── SILVER ├── SILVER ├── SILVER
└── GOLD └── GOLD └── GOLD
Benefit: Complete data isolation, simpler permissions, self-contained datasets.
Layer Architecture¶
BRONZE Layer (xo-foundry)¶
Purpose: Temporary raw landing zone
Characteristics: - Truncated daily (idempotent loads) - All VARCHAR (no type conversion) - Metadata added (RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE) - No transformations - Temporary (24-hour lifespan)
Naming: {SOURCE}_{OBJECT}
Example: WBP_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS
SILVER Layer (dbt)¶
Purpose: Historical preservation (source of history)
Characteristics: - Permanent append-only or SCD Type 2 - Type conversion (VARCHAR → proper types) - Deduplication (by RECORD_KEY) - NO enrichment (no JOINs) - NO filtering (preserve all records) - NO aggregation (same grain as Bronze)
Naming: {OBJECT} (no prefix)
Example: WBP_DB.SILVER.CONTACT_TIMESTAMPS
Why no enrichment in Silver? - Performance (joins are expensive on large tables) - Maintainability (roster changes don't trigger re-processing) - Separation of concerns (Silver = preservation, Gold = business logic)
GOLD Layer (dbt) - Four Types¶
1. Facts (fct_ prefix)¶
Purpose: Enriched granular data (SOT for analysts)
Characteristics: - Same grain as Silver (row per event) - JOIN with employee rosters (CORE_DB) - JOIN with client glossaries - Filter to XO agents only - Materialized as table
Example: WBP_DB.GOLD.FCT_CONTACTS
2. Dimensions (dim_ prefix)¶
Purpose: Current state entities and reference data
Characteristics: - SCD Type 1 (latest) or Type 2 (historical) - Reference data for joins - Materialized as table
Example: WBP_DB.GOLD.DIM_AGENT
3. Aggregates (agg_ prefix)¶
Purpose: Pre-aggregated base metrics (building blocks)
Characteristics: - Single grain (agent-day, channel-hour) - Standard KPIs + client-specific KPIs - Reusable across multiple reports - Few or no JOINs - Materialized as table
Example: WBP_DB.GOLD.AGG_AGENT_DAILY
4. Reports (rpt_ prefix)¶
Purpose: End-user consumption layer (Tableau, analysts)
Characteristics: - JOINs aggregates with dimensions and targets - Calculated fields (variance, performance flags) - Tableau-optimized - Materialized as VIEW (zero storage cost) - Always fresh (computed on query)
Example: WBP_DB.GOLD.RPT_AGENT_DASHBOARD
Data Flow¶
┌─────────────────────────────────────────────────────────┐
│ xo-foundry (Airflow) │
│ Extract → S3 → BRONZE │
│ - Raw landing (truncated daily) │
│ - All VARCHAR, metadata added │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ dbt - SILVER Layer │
│ - Historical preservation │
│ - Type conversion, deduplication │
│ - NO enrichment, NO filtering │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ dbt - GOLD FACTS │
│ - Enriched with employee + glossary metadata │
│ - Filtered to XO agents │
│ - SOT for analysts │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ dbt - GOLD AGGREGATES │
│ - Pre-aggregated metrics │
│ - Building blocks for reports │
│ - Reusable across dashboards │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ dbt - GOLD REPORTS (views) │
│ - Tableau-ready views │
│ - JOINs with targets, variance calculations │
│ - Zero storage cost │
└─────────────────────────────────────────────────────────┘
Naming Conventions¶
Principle: Schema indicates layer, prefix indicates TYPE¶
| Layer | Prefix | Example | Meaning |
|---|---|---|---|
| BRONZE | {SOURCE}_ |
GLADLY_CONTACTS |
Source system |
| SILVER | None | CONTACT_TIMESTAMPS |
Historical table |
| GOLD Facts | fct_ |
fct_contacts |
Enriched facts |
| GOLD Dims | dim_ |
dim_agent |
Dimensions |
| GOLD Aggs | agg_ |
agg_agent_daily |
Aggregates |
| GOLD Reports | rpt_ |
rpt_agent_dashboard |
Reports/views |
Why semantic prefixes?
- ✅ Not redundant (SILVER.CONTACT_TIMESTAMPS, not SILVER.SILVER_*)
- ✅ Self-documenting (fct_contacts = fact table)
- ✅ Industry standard (dbt Labs, GitLab, Kimball)
- ✅ Consistent length (all 3-4 characters)
Reference Data Placement¶
Employee Rosters → CORE_DB.SILVER¶
Why CORE_DB? - Shared resource (XO manages, not clients) - Centralized maintenance - Source agnostic - No performance impact (cross-DB joins are fast)
Client Glossaries → Client Databases¶
Why client DB? - Client-specific data stays with client - Simpler permissions - Self-contained datasets - No cross-DB joins for glossaries
Performance Strategy¶
What's Stored (Tables)¶
- BRONZE (temporary, truncated daily)
- SILVER (permanent historical)
- GOLD Facts (enriched, granular)
- GOLD Aggregates (pre-calculated metrics)
What's Virtual (Views)¶
- GOLD Reports (zero storage cost)
Why This Works¶
- Pre-aggregate expensive operations (millions → thousands of rows)
- Store aggregates (fast base for queries)
- Reports are views (always fresh, no storage)
- Views query pre-aggregated data (fast)
Trade-off: Storage is cheap, compute is expensive in Snowflake.
Cross-Database Join Performance¶
Question: Does joining CORE_DB rosters with client facts hurt performance?
Answer: No - negligible impact (0-1% difference)
Why: - All databases in same account share storage layer - Query optimizer treats cross-DB joins identically - Micro-partition pruning works across databases - No data movement or network latency
Standard KPI Schema¶
All clients implement standard KPI columns in agg_*_daily tables:
Required Columns (all clients must have):
- KPI_DATE, AGENT_ID, AGENT_EMAIL, AGENT_FULL_NAME
- CONTACT_COUNT, CONVERSATION_COUNT, UNIQUE_CUSTOMER_COUNT
- AVG_HANDLE_TIME, TOTAL_HANDLE_TIME
- AVG_HOLD_TIME, TOTAL_HOLD_TIME
- AVG_AHT
- AVG_CSAT, AVG_QA_SCORE
- AVG_FIRST_RESPONSE_TIME
Client-Specific Columns (vary by client):
- Warby Parker: CHAT_COUNT, IVR_ENTRY_COUNT, AVG_PAYMENT_AMOUNT
- Condé Nast: TOPIC_COUNT, ARTICLE_LINKS_SENT
- etc.
Technology Stack¶
| Layer | Tool | Purpose |
|---|---|---|
| Extraction | xo-foundry (Airflow) | Extract from APIs/Google Sheets |
| Landing | Snowflake BRONZE | Temporary raw storage |
| Transformation | dbt | SILVER → GOLD transformations |
| Orchestration | Airflow (Astronomer) | Schedule and monitor |
| Consumption | Tableau | Dashboards (query GOLD views) |
Key Architecture Decisions (ADRs)¶
| ADR | Topic | Status |
|---|---|---|
| 001 | Load strategy terminology | ✅ Approved |
| 004 | Metadata columns and hashing | ✅ Approved |
| 006 | Bronze truncate/force strategy | ✅ Approved |
| 008 | Gold layer architecture | ✅ Approved |
| 009 | Naming conventions | ✅ Approved |
| 010 | Reference data placement | ✅ Approved |
Quick Reference¶
When to query each layer:¶
| Layer | Who Queries | Use Case |
|---|---|---|
| BRONZE | Never (xo-foundry only) | Temporary landing |
| SILVER | Power users (rarely) | Historical raw data |
| GOLD Facts | Data scientists, analysts | Ad-hoc deep dives |
| GOLD Aggregates | Power users | Custom aggregations |
| GOLD Reports | Business users, Tableau | Standard dashboards |
dbt folder structure:¶
models/
├── silver/
│ └── {client}/
│ ├── contact_timestamps.sql → SILVER.CONTACT_TIMESTAMPS
│ └── inbox_glossary.sql → SILVER.INBOX_GLOSSARY
│
└── gold/
└── {client}/
├── facts/
│ └── fct_contacts.sql → GOLD.FCT_CONTACTS
├── dimensions/
│ └── dim_agent.sql → GOLD.DIM_AGENT
├── aggregates/
│ └── agg_agent_daily.sql → GOLD.AGG_AGENT_DAILY
└── reports/
└── rpt_agent_dashboard.sql → GOLD.RPT_AGENT_DASHBOARD (view)
Further Reading¶
- ELT Layer Architecture Best Practices - Detailed guide
- Cross-Database Joins & Reference Data Placement - Performance analysis
- Aggregates vs Views Naming - Detailed distinction
- Naming Convention Best Practices - Industry research
- dbt Development Workflow - How to develop dbt models