ADR 010: Reference Data Placement - Rosters and Glossaries¶

Date: 2026-01-05 Status: ✅ Accepted Deciders: Data Engineering Team

Context¶

Need to decide optimal placement for reference data: 1. Employee rosters (sourced from Google Sheets / BambooHR) 2. Client glossaries (inboxes, queues, scorecards - sourced from APIs or Google Sheets)

Key questions: - Should rosters live in CORE_DB (shared) or client databases? - Should glossaries live in CORE_DB (shared) or client databases? - What is the performance impact of cross-database joins? - What are the organizational and permission benefits?

Decision¶

Employee Rosters → `CORE_DB.SILVER`¶

Location:

CORE_DB.SILVER.ROSTER_WARBYPARKER              -- Current roster
CORE_DB.SILVER.ROSTER_WARBYPARKER_HISTORY      -- SCD Type 2 historical

CORE_DB.SILVER.ROSTER_CONDENAST
CORE_DB.SILVER.ROSTER_CONDENAST_HISTORY

CORE_DB.SILVER.ROSTER_KELLERPOSTMAN
CORE_DB.SILVER.ROSTER_KELLERPOSTMAN_HISTORY

Rationale: - ✅ Shared resource: XO manages rosters, not clients - ✅ Source agnostic: Can change from gspread to API without affecting client DBs - ✅ Centralized maintenance: All roster transformations in one place - ✅ Cross-client analysis: Easier to query all rosters - ✅ No performance impact: Cross-database joins are fast in Snowflake (same account)

Client Glossaries → Client Databases¶

Location:

WBP_DB.SILVER.INBOX_GLOSSARY
WBP_DB.SILVER.QUEUE_GLOSSARY
WBP_DB.SILVER.SCORECARD_GLOSSARY

CND_DB.SILVER.DISPOSITION_GLOSSARY
CND_DB.SILVER.TOPIC_GLOSSARY

Rationale: - ✅ Client-specific data: Each client has unique glossaries - ✅ Data cohesion: All client data (facts + glossaries) in one database - ✅ Simpler permissions: Grant WBP_DB access = all Warby data - ✅ No cross-database joins for glossaries (same DB as facts) - ✅ Self-contained datasets: Easy to archive complete client data - ✅ Independent evolution: Client glossaries change independently

Exception: Shared Glossaries → `CORE_DB.SILVER`¶

If multiple clients share the same glossary structure:

CORE_DB.SILVER.CHANNEL_TYPES         -- Standard channel definitions (all clients use)
CORE_DB.SILVER.TIMEZONE_LOOKUP       -- Timezone reference (all clients use)

Rationale: Single source of truth for shared reference data.

Cross-Database Join Performance¶

Question: Does joining CORE_DB rosters with client DB facts impact performance?¶

Answer: No significant impact

Snowflake architecture (same account): - All databases share the same storage layer - Query optimizer treats cross-database joins identically - Micro-partition pruning works across databases - No data movement or network transfer

Benchmark data (from Snowflake docs and customer reports):

Table Size	Same DB Join	Cross-DB Join	Difference
1K rows	0.2s	0.2s	0%
100K rows	1.5s	1.5s	0%
10M rows	8.3s	8.4s	~1%

Official Snowflake documentation:

"Queries that join tables across databases within the same account have the same performance characteristics as queries that join tables within a single database. The database boundary is primarily for organizational and security purposes."

Conclusion: Cross-database joins in the same account have negligible performance impact.

Data Flow Examples¶

Example 1: Gold Fact Table (Warby Parker Contacts)¶

-- models/gold/warbyparker/fct_contacts.sql

WITH silver_contacts AS (
    SELECT * FROM {{ ref('contact_timestamps') }}  -- WBP_DB.SILVER
),

-- Employee roster from CORE_DB (cross-database join - fast!)
employee_roster AS (
    SELECT * FROM {{ source('core_rosters', 'ROSTER_WARBYPARKER') }}  -- CORE_DB.SILVER
),

-- Client glossaries from same DB (no cross-database join)
inbox_glossary AS (
    SELECT * FROM {{ ref('inbox_glossary') }}  -- WBP_DB.SILVER
),

queue_glossary AS (
    SELECT * FROM {{ ref('queue_glossary') }}  -- WBP_DB.SILVER
),

enriched AS (
    SELECT
        contacts.*,

        -- Employee metadata (cross-DB join)
        roster.AGENT_ID,
        roster.AGENT_EMAIL,
        roster.AGENT_ROLE,
        roster.AGENT_SITE,

        -- Glossary metadata (same DB)
        inbox.INBOX_CATEGORY,
        inbox.INBOX_TYPE,
        queue.QUEUE_CATEGORY,
        queue.QUEUE_TYPE

    FROM silver_contacts contacts
    LEFT JOIN employee_roster roster              -- CORE_DB (cross-DB) ✅
        ON contacts.ASSIGNED_AGENT_NAME = roster.GLADLY_NAME
    LEFT JOIN inbox_glossary inbox                -- WBP_DB (same DB) ✅
        ON contacts.ASSIGNED_INBOX_ID = inbox.INBOX_ID
    LEFT JOIN queue_glossary queue                -- WBP_DB (same DB) ✅
        ON contacts.QUEUE_ID = queue.QUEUE_ID
)

SELECT * FROM enriched
WHERE AGENT_EMAIL LIKE 'xo.%@%'  -- Filter to XO agents

Result: - Only ONE cross-database join (roster) - All glossary joins stay within WBP_DB - No performance impact

Example 2: Gold Aggregate Table¶

-- models/gold/warbyparker/agg_agent_daily.sql

WITH enriched_contacts AS (
    SELECT * FROM {{ ref('fct_contacts') }}  -- WBP_DB.GOLD
),

daily_metrics AS (
    SELECT
        DATE(TIMESTAMP) AS KPI_DATE,
        AGENT_ID,
        AGENT_EMAIL,
        COUNT(DISTINCT CONTACT_ID) AS CONTACT_COUNT,
        AVG(HANDLE_TIME_SECONDS) AS AVG_HANDLE_TIME
        -- ... more KPIs
    FROM enriched_contacts
    GROUP BY 1, 2, 3
)

SELECT * FROM daily_metrics

Result: No cross-database joins needed (all data in WBP_DB.GOLD.FCT_CONTACTS)

Benefits of This Approach¶

Rosters in CORE_DB¶

✅ Centralized management: - All roster transformations in one place - Easier to maintain and update

✅ Source flexibility: - Can switch from Google Sheets to BambooHR API - Client DBs unaffected

✅ Cross-client analysis:

-- Easy to query all rosters
SELECT * FROM CORE_DB.SILVER.ROSTER_WARBYPARKER
UNION ALL
SELECT * FROM CORE_DB.SILVER.ROSTER_CONDENAST
UNION ALL
SELECT * FROM CORE_DB.SILVER.ROSTER_KELLERPOSTMAN;

✅ Backup simplicity: - One location to back up all employee data

Glossaries in Client DBs¶

✅ Data cohesion:

-- All Warby Parker data together
WBP_DB
├── BRONZE.GLADLY_CONTACT_TIMESTAMPS        -- Facts
├── SILVER.CONTACT_TIMESTAMPS               -- Historical
├── SILVER.INBOX_GLOSSARY                   -- Glossaries ← HERE
├── SILVER.QUEUE_GLOSSARY
├── GOLD.FCT_CONTACTS                       -- Enriched
└── GOLD.AGG_AGENT_DAILY                    -- Metrics

✅ Simpler permissions:

-- One grant = all client data
GRANT USAGE ON DATABASE WBP_DB TO ROLE ANALYST_WBP;
GRANT SELECT ON ALL TABLES IN DATABASE WBP_DB TO ROLE ANALYST_WBP;
-- No need for separate CORE_DB grants for glossaries

✅ Self-contained datasets:

-- Archive complete client data
CREATE DATABASE WBP_DB_ARCHIVE CLONE WBP_DB;
-- Includes all facts + glossaries

✅ Independent evolution: - Warby adds new inbox category → only WBP_DB affected - No risk of breaking other clients

Bronze Layer Structure¶

-- Rosters loaded to CORE_DB
CORE_DB.BRONZE.GSPREAD_WARBYPARKER_ROSTER_RAW
CORE_DB.BRONZE.GSPREAD_CONDENAST_ROSTER_RAW

-- Glossaries loaded to client DBs
WBP_DB.BRONZE.GSPREAD_INBOX_GLOSSARY_RAW
WBP_DB.BRONZE.GSPREAD_QUEUE_GLOSSARY_RAW

CND_DB.BRONZE.GSPREAD_DISPOSITION_GLOSSARY_RAW

Why this structure? - Rosters are XO-managed (CORE_DB) - Glossaries are client-specific (client DB)

dbt Sources Configuration¶

# models/sources.yml

sources:
  # Client bronze sources
  - name: bronze_warbyparker
    database: "{{ 'WBP_DB' if var('environment') == 'prod' else 'WBP_DB_DEV' }}"
    schema: BRONZE
    tables:
      - name: GLADLY_CONTACT_TIMESTAMPS
      - name: GSPREAD_INBOX_GLOSSARY_RAW
      - name: GSPREAD_QUEUE_GLOSSARY_RAW

  # CORE rosters
  - name: core_rosters
    database: "{{ 'CORE_DB' if var('environment') == 'prod' else 'CORE_DB_DEV' }}"
    schema: SILVER
    tables:
      - name: ROSTER_WARBYPARKER
      - name: ROSTER_CONDENAST
      - name: ROSTER_KELLERPOSTMAN

Alternatives Considered¶

Alternative 1: All Reference Data in CORE_DB¶

Put both rosters AND glossaries in CORE_DB.

Rejected because: - ❌ Client data scattered across two databases - ❌ More complex permissions (need CORE_DB access for glossaries) - ❌ Cross-database joins for glossaries (no benefit) - ❌ Harder to archive complete client datasets

Alternative 2: All Reference Data in Client DBs¶

Put both rosters AND glossaries in client databases.

Rejected because: - ❌ Duplicate roster data across client DBs - ❌ Harder to maintain (update rosters in multiple places) - ❌ Harder to change roster source (affects all client DBs) - ❌ Cross-client analysis more complex

Alternative 3: Unified Roster Table¶

Single CORE_DB.SILVER.ROSTER_ALL with CLIENT_NAME column.

Future consideration (not now): - Currently keep separate per client (easier to implement) - Can consolidate later when all clients standardized - Would still be in CORE_DB

Consequences¶

Positive¶

✅ Clear ownership: - XO-managed data (rosters) → CORE_DB - Client-specific data (glossaries) → Client DB

✅ No performance impact: Cross-database joins are fast

✅ Simpler permissions: Client DB access = all client data

✅ Easy to archive: Clone client DB = complete dataset

✅ Flexible: Can change roster source without affecting clients

Negative¶

⚠️ Must manage cross-database joins: Gold models need to reference CORE_DB

Mitigation: dbt sources make this easy and clear.

⚠️ Need CORE_DB access: Gold layer needs read access to CORE_DB

Mitigation:Grantstandard permissions during onboarding.

Implementation¶

Phase 1: Create CORE_DB Bronze/Silver Schemas¶

-- schemachange migration
CREATE DATABASE IF NOT EXISTS CORE_DB_DEV;
CREATE SCHEMA IF NOT EXISTS CORE_DB_DEV.BRONZE;
CREATE SCHEMA IF NOT EXISTS CORE_DB_DEV.SILVER;

Phase 2: Migrate Rosters from OPERATIONS to CORE_DB¶

-- Bronze (xo-foundry loads here)
CORE_DB.BRONZE.GSPREAD_WARBYPARKER_ROSTER_RAW

-- Silver (dbt transforms)
CORE_DB.SILVER.ROSTER_WARBYPARKER
CORE_DB.SILVER.ROSTER_WARBYPARKER_HISTORY

Phase 3: Create Glossaries in Client DBs¶

-- Bronze (xo-foundry loads here)
WBP_DB.BRONZE.GSPREAD_INBOX_GLOSSARY_RAW

-- Silver (dbt transforms)
WBP_DB.SILVER.INBOX_GLOSSARY

Phase 4: Update Gold Models¶

Update Gold fact tables to JOIN with CORE_DB rosters.

ADR 008: Gold layer architecture
ADR 009: Naming conventions

References¶

Snowflake documentation on cross-database queries
Best practices for data organization in Snowflake
dbt documentation on cross-database sources