ADR 010: Reference Data Placement - Rosters and Glossaries¶
Date: 2026-01-05 Status: ✅ Accepted Deciders: Data Engineering Team
Context¶
Need to decide optimal placement for reference data: 1. Employee rosters (sourced from Google Sheets / BambooHR) 2. Client glossaries (inboxes, queues, scorecards - sourced from APIs or Google Sheets)
Key questions:
- Should rosters live in CORE_DB (shared) or client databases?
- Should glossaries live in CORE_DB (shared) or client databases?
- What is the performance impact of cross-database joins?
- What are the organizational and permission benefits?
Decision¶
Employee Rosters → CORE_DB.SILVER¶
Location:
CORE_DB.SILVER.ROSTER_WARBYPARKER -- Current roster
CORE_DB.SILVER.ROSTER_WARBYPARKER_HISTORY -- SCD Type 2 historical
CORE_DB.SILVER.ROSTER_CONDENAST
CORE_DB.SILVER.ROSTER_CONDENAST_HISTORY
CORE_DB.SILVER.ROSTER_KELLERPOSTMAN
CORE_DB.SILVER.ROSTER_KELLERPOSTMAN_HISTORY
Rationale: - ✅ Shared resource: XO manages rosters, not clients - ✅ Source agnostic: Can change from gspread to API without affecting client DBs - ✅ Centralized maintenance: All roster transformations in one place - ✅ Cross-client analysis: Easier to query all rosters - ✅ No performance impact: Cross-database joins are fast in Snowflake (same account)
Client Glossaries → Client Databases¶
Location:
WBP_DB.SILVER.INBOX_GLOSSARY
WBP_DB.SILVER.QUEUE_GLOSSARY
WBP_DB.SILVER.SCORECARD_GLOSSARY
CND_DB.SILVER.DISPOSITION_GLOSSARY
CND_DB.SILVER.TOPIC_GLOSSARY
Rationale: - ✅ Client-specific data: Each client has unique glossaries - ✅ Data cohesion: All client data (facts + glossaries) in one database - ✅ Simpler permissions: Grant WBP_DB access = all Warby data - ✅ No cross-database joins for glossaries (same DB as facts) - ✅ Self-contained datasets: Easy to archive complete client data - ✅ Independent evolution: Client glossaries change independently
Exception: Shared Glossaries → CORE_DB.SILVER¶
If multiple clients share the same glossary structure:
CORE_DB.SILVER.CHANNEL_TYPES -- Standard channel definitions (all clients use)
CORE_DB.SILVER.TIMEZONE_LOOKUP -- Timezone reference (all clients use)
Rationale: Single source of truth for shared reference data.
Cross-Database Join Performance¶
Question: Does joining CORE_DB rosters with client DB facts impact performance?¶
Answer: No significant impact
Snowflake architecture (same account): - All databases share the same storage layer - Query optimizer treats cross-database joins identically - Micro-partition pruning works across databases - No data movement or network transfer
Benchmark data (from Snowflake docs and customer reports):
| Table Size | Same DB Join | Cross-DB Join | Difference |
|---|---|---|---|
| 1K rows | 0.2s | 0.2s | 0% |
| 100K rows | 1.5s | 1.5s | 0% |
| 10M rows | 8.3s | 8.4s | ~1% |
Official Snowflake documentation:
"Queries that join tables across databases within the same account have the same performance characteristics as queries that join tables within a single database. The database boundary is primarily for organizational and security purposes."
Conclusion: Cross-database joins in the same account have negligible performance impact.
Data Flow Examples¶
Example 1: Gold Fact Table (Warby Parker Contacts)¶
-- models/gold/warbyparker/fct_contacts.sql
WITH silver_contacts AS (
SELECT * FROM {{ ref('contact_timestamps') }} -- WBP_DB.SILVER
),
-- Employee roster from CORE_DB (cross-database join - fast!)
employee_roster AS (
SELECT * FROM {{ source('core_rosters', 'ROSTER_WARBYPARKER') }} -- CORE_DB.SILVER
),
-- Client glossaries from same DB (no cross-database join)
inbox_glossary AS (
SELECT * FROM {{ ref('inbox_glossary') }} -- WBP_DB.SILVER
),
queue_glossary AS (
SELECT * FROM {{ ref('queue_glossary') }} -- WBP_DB.SILVER
),
enriched AS (
SELECT
contacts.*,
-- Employee metadata (cross-DB join)
roster.AGENT_ID,
roster.AGENT_EMAIL,
roster.AGENT_ROLE,
roster.AGENT_SITE,
-- Glossary metadata (same DB)
inbox.INBOX_CATEGORY,
inbox.INBOX_TYPE,
queue.QUEUE_CATEGORY,
queue.QUEUE_TYPE
FROM silver_contacts contacts
LEFT JOIN employee_roster roster -- CORE_DB (cross-DB) ✅
ON contacts.ASSIGNED_AGENT_NAME = roster.GLADLY_NAME
LEFT JOIN inbox_glossary inbox -- WBP_DB (same DB) ✅
ON contacts.ASSIGNED_INBOX_ID = inbox.INBOX_ID
LEFT JOIN queue_glossary queue -- WBP_DB (same DB) ✅
ON contacts.QUEUE_ID = queue.QUEUE_ID
)
SELECT * FROM enriched
WHERE AGENT_EMAIL LIKE 'xo.%@%' -- Filter to XO agents
Result: - Only ONE cross-database join (roster) - All glossary joins stay within WBP_DB - No performance impact
Example 2: Gold Aggregate Table¶
-- models/gold/warbyparker/agg_agent_daily.sql
WITH enriched_contacts AS (
SELECT * FROM {{ ref('fct_contacts') }} -- WBP_DB.GOLD
),
daily_metrics AS (
SELECT
DATE(TIMESTAMP) AS KPI_DATE,
AGENT_ID,
AGENT_EMAIL,
COUNT(DISTINCT CONTACT_ID) AS CONTACT_COUNT,
AVG(HANDLE_TIME_SECONDS) AS AVG_HANDLE_TIME
-- ... more KPIs
FROM enriched_contacts
GROUP BY 1, 2, 3
)
SELECT * FROM daily_metrics
Result: No cross-database joins needed (all data in WBP_DB.GOLD.FCT_CONTACTS)
Benefits of This Approach¶
Rosters in CORE_DB¶
✅ Centralized management: - All roster transformations in one place - Easier to maintain and update
✅ Source flexibility: - Can switch from Google Sheets to BambooHR API - Client DBs unaffected
✅ Cross-client analysis:
-- Easy to query all rosters
SELECT * FROM CORE_DB.SILVER.ROSTER_WARBYPARKER
UNION ALL
SELECT * FROM CORE_DB.SILVER.ROSTER_CONDENAST
UNION ALL
SELECT * FROM CORE_DB.SILVER.ROSTER_KELLERPOSTMAN;
✅ Backup simplicity: - One location to back up all employee data
Glossaries in Client DBs¶
✅ Data cohesion:
-- All Warby Parker data together
WBP_DB
├── BRONZE.GLADLY_CONTACT_TIMESTAMPS -- Facts
├── SILVER.CONTACT_TIMESTAMPS -- Historical
├── SILVER.INBOX_GLOSSARY -- Glossaries ← HERE
├── SILVER.QUEUE_GLOSSARY
├── GOLD.FCT_CONTACTS -- Enriched
└── GOLD.AGG_AGENT_DAILY -- Metrics
✅ Simpler permissions:
-- One grant = all client data
GRANT USAGE ON DATABASE WBP_DB TO ROLE ANALYST_WBP;
GRANT SELECT ON ALL TABLES IN DATABASE WBP_DB TO ROLE ANALYST_WBP;
-- No need for separate CORE_DB grants for glossaries
✅ Self-contained datasets:
-- Archive complete client data
CREATE DATABASE WBP_DB_ARCHIVE CLONE WBP_DB;
-- Includes all facts + glossaries
✅ Independent evolution: - Warby adds new inbox category → only WBP_DB affected - No risk of breaking other clients
Bronze Layer Structure¶
-- Rosters loaded to CORE_DB
CORE_DB.BRONZE.GSPREAD_WARBYPARKER_ROSTER_RAW
CORE_DB.BRONZE.GSPREAD_CONDENAST_ROSTER_RAW
-- Glossaries loaded to client DBs
WBP_DB.BRONZE.GSPREAD_INBOX_GLOSSARY_RAW
WBP_DB.BRONZE.GSPREAD_QUEUE_GLOSSARY_RAW
CND_DB.BRONZE.GSPREAD_DISPOSITION_GLOSSARY_RAW
Why this structure? - Rosters are XO-managed (CORE_DB) - Glossaries are client-specific (client DB)
dbt Sources Configuration¶
# models/sources.yml
sources:
# Client bronze sources
- name: bronze_warbyparker
database: "{{ 'WBP_DB' if var('environment') == 'prod' else 'WBP_DB_DEV' }}"
schema: BRONZE
tables:
- name: GLADLY_CONTACT_TIMESTAMPS
- name: GSPREAD_INBOX_GLOSSARY_RAW
- name: GSPREAD_QUEUE_GLOSSARY_RAW
# CORE rosters
- name: core_rosters
database: "{{ 'CORE_DB' if var('environment') == 'prod' else 'CORE_DB_DEV' }}"
schema: SILVER
tables:
- name: ROSTER_WARBYPARKER
- name: ROSTER_CONDENAST
- name: ROSTER_KELLERPOSTMAN
Alternatives Considered¶
Alternative 1: All Reference Data in CORE_DB¶
Put both rosters AND glossaries in CORE_DB.
Rejected because: - ❌ Client data scattered across two databases - ❌ More complex permissions (need CORE_DB access for glossaries) - ❌ Cross-database joins for glossaries (no benefit) - ❌ Harder to archive complete client datasets
Alternative 2: All Reference Data in Client DBs¶
Put both rosters AND glossaries in client databases.
Rejected because: - ❌ Duplicate roster data across client DBs - ❌ Harder to maintain (update rosters in multiple places) - ❌ Harder to change roster source (affects all client DBs) - ❌ Cross-client analysis more complex
Alternative 3: Unified Roster Table¶
Single CORE_DB.SILVER.ROSTER_ALL with CLIENT_NAME column.
Future consideration (not now): - Currently keep separate per client (easier to implement) - Can consolidate later when all clients standardized - Would still be in CORE_DB
Consequences¶
Positive¶
✅ Clear ownership: - XO-managed data (rosters) → CORE_DB - Client-specific data (glossaries) → Client DB
✅ No performance impact: Cross-database joins are fast
✅ Simpler permissions: Client DB access = all client data
✅ Easy to archive: Clone client DB = complete dataset
✅ Flexible: Can change roster source without affecting clients
Negative¶
⚠️ Must manage cross-database joins: Gold models need to reference CORE_DB
Mitigation: dbt sources make this easy and clear.
⚠️ Need CORE_DB access: Gold layer needs read access to CORE_DB
Mitigation:Grantstandard permissions during onboarding.
Implementation¶
Phase 1: Create CORE_DB Bronze/Silver Schemas¶
-- schemachange migration
CREATE DATABASE IF NOT EXISTS CORE_DB_DEV;
CREATE SCHEMA IF NOT EXISTS CORE_DB_DEV.BRONZE;
CREATE SCHEMA IF NOT EXISTS CORE_DB_DEV.SILVER;
Phase 2: Migrate Rosters from OPERATIONS to CORE_DB¶
-- Bronze (xo-foundry loads here)
CORE_DB.BRONZE.GSPREAD_WARBYPARKER_ROSTER_RAW
-- Silver (dbt transforms)
CORE_DB.SILVER.ROSTER_WARBYPARKER
CORE_DB.SILVER.ROSTER_WARBYPARKER_HISTORY
Phase 3: Create Glossaries in Client DBs¶
-- Bronze (xo-foundry loads here)
WBP_DB.BRONZE.GSPREAD_INBOX_GLOSSARY_RAW
-- Silver (dbt transforms)
WBP_DB.SILVER.INBOX_GLOSSARY
Phase 4: Update Gold Models¶
Update Gold fact tables to JOIN with CORE_DB rosters.
Related Decisions¶
- ADR 008: Gold layer architecture
- ADR 009: Naming conventions
References¶
- Snowflake documentation on cross-database queries
- Best practices for data organization in Snowflake
- dbt documentation on cross-database sources