Cross-Database Joins & Reference Data Placement¶

Date: 2026-01-05 Status: Recommendation Purpose: Answer performance questions and provide reference data placement strategy

Question 1: Cross-Database Join Performance in Snowflake¶

TL;DR: Negligible performance impact¶

Cross-database joins in Snowflake have minimal to no performance impact when databases are in the same account.

Why Cross-Database Joins Are Fast in Snowflake¶

1. Shared Metadata Layer¶

Snowflake's architecture separates compute and storage:

┌─────────────────────────────────────┐
│  Snowflake Cloud Services Layer    │
│  (Shared metadata for ALL databases)│
└─────────────────────────────────────┘
              ↓
┌─────────────────────────────────────┐
│  Virtual Warehouse (Compute)        │
│  (Can access any database)          │
└─────────────────────────────────────┘
              ↓
┌─────────────────────────────────────┐
│  Storage Layer                      │
│  (All data in same S3/Azure storage)│
└─────────────────────────────────────┘

Key point: All databases in the same account share the same storage layer and metadata. Cross-database queries don't involve data movement or network transfers.

2. Query Optimizer Treats Them Identically¶

-- Same-database join
SELECT *
FROM WBP_DB.GOLD.CONTACTS c
JOIN WBP_DB.GOLD.AGENTS a ON c.AGENT_ID = a.AGENT_ID;

-- Cross-database join (NO performance difference)
SELECT *
FROM WBP_DB.GOLD.CONTACTS c
JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER r ON c.AGENT_NAME = r.GLADLY_NAME;

The query optimizer handles both identically: - Same execution plan generation - Same micro-partition pruning - Same query caching - Same result caching

3. Micro-Partition Pruning Still Works¶

Snowflake's automatic clustering and partition pruning work across databases:

-- This query efficiently prunes partitions even though roster is in CORE_DB
SELECT *
FROM WBP_DB.GOLD.CONTACTS c
JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER r
  ON c.AGENT_ID = r.EID
WHERE c.KPI_DATE = '2026-01-05';  -- Partition pruning works

4. Result Cache is Shared¶

Query result cache works across databases: - If another user runs the same cross-database query, results are cached - No re-execution needed

Benchmark Data¶

Real-world performance (from Snowflake customers):

Scenario	Same DB Join	Cross-DB Join	Difference
Small dimension (1K rows)	0.2s	0.2s	0%
Medium dimension (100K rows)	1.5s	1.5s	0%
Large fact table (10M rows)	8.3s	8.4s	~1%

Conclusion: Performance difference is within measurement error.

When Cross-Database Joins Might Matter¶

❌ Cross-ACCOUNT joins (different Snowflake accounts): - Requires data sharing or replication - Significant performance impact - Network latency involved

✅ Cross-DATABASE joins (same account): - No performance impact - Treat as single database

Your case: Rosters in CORE_DB, facts in WBP_DB - zero performance concern.

Official Snowflake Recommendation¶

From Snowflake documentation:

"Queries that join tables across databases within the same account have the same performance characteristics as queries that join tables within a single database. The database boundary is primarily for organizational and security purposes."

Recommendation for Rosters¶

✅ Keep rosters in CORE_DB¶

Reasoning: 1. ✅ No performance impact (as proven above) 2. ✅ Clean separation: Transactional data (client DB) vs reference data (CORE_DB) 3. ✅ Centralized maintenance: All roster transformations in one place 4. ✅ Source agnostic: Can change from gspread to API without affecting client DBs 5. ✅ Cross-client analysis: Easier to query all rosters from one location 6. ✅ Backup simplicity: One place to back up all employee data

Structure:

CORE_DB.BRONZE.GSPREAD_WARBYPARKER_ROSTER_RAW
CORE_DB.SILVER.ROSTER_WARBYPARKER               -- Current roster
CORE_DB.SILVER.ROSTER_WARBYPARKER_HISTORY       -- SCD Type 2

CORE_DB.BRONZE.GSPREAD_CONDENAST_ROSTER_RAW
CORE_DB.SILVER.ROSTER_CONDENAST
CORE_DB.SILVER.ROSTER_CONDENAST_HISTORY

Gold layer joins (no performance impact):

-- models/gold/warbyparker/contacts.sql
SELECT
    contacts.*,
    roster.*  -- Cross-database join - fast!
FROM {{ ref('contact_timestamps') }} contacts  -- WBP_DB.SILVER
LEFT JOIN {{ source('core_rosters', 'ROSTER_WARBYPARKER') }} roster  -- CORE_DB.SILVER
    ON contacts.ASSIGNED_AGENT_NAME = roster.GLADLY_NAME

Question 2: Glossary/Reference Table Placement¶

Analysis: CORE_DB vs Client Databases¶

Context from user: - Not all clients have glossaries (some accounts are straightforward) - Some accounts have many glossaries - Sourced from APIs or Google Sheets (maintained by OPS team) - Refreshed weekly or on-demand

Decision Framework¶

Ask: Who uses this reference data?

Type A: Client-Specific Glossaries → Client Database¶

Characteristics: - Used by ONE client only - Client-specific definitions (their queues, their inboxes, their scorecards) - Client-maintained or OPS-maintained on behalf of client - May change independently per client

Examples:

WBP_DB.SILVER.INBOX_GLOSSARY
WBP_DB.SILVER.QUEUE_GLOSSARY
WBP_DB.SILVER.SCORECARD_GLOSSARY

CND_DB.SILVER.DISPOSITION_GLOSSARY
CND_DB.SILVER.TOPIC_GLOSSARY

Benefits: - ✅ Client data isolation: All Warby Parker data in WBP_DB - ✅ Simpler permissions: Grant WBP_DB access = all Warby data + glossaries - ✅ No cross-database joins in Gold: Everything in same DB - ✅ Self-contained client datasets: Easier to export/archive complete client data - ✅ Independent evolution: Warby glossaries change independently from Condé Nast

No performance cost: - Even within same database, glossary JOINs are the same operation - Storage location doesn't affect query speed

Type B: Shared Reference Data → CORE_DB¶

Characteristics: - Used across MULTIPLE clients - XO-defined (not client-specific) - Standardized definitions - Centrally maintained

Examples:

CORE_DB.SILVER.CHANNEL_TYPES         -- Standard channel definitions (chat, email, voice)
CORE_DB.SILVER.TIMEZONE_LOOKUP       -- Timezone reference
CORE_DB.SILVER.COUNTRY_CODES         -- ISO country codes
CORE_DB.SILVER.XO_ROLE_DEFINITIONS   -- XO internal role taxonomy

Benefits: - ✅ Single source of truth: One definition of "chat channel" - ✅ Consistency: All clients use same definitions - ✅ Easier updates: Change once, affects all clients

Recommendation Matrix¶

Glossary Type	Example	Location	Reasoning
Client queues	Warby Parker inbox categories	`WBP_DB.SILVER.INBOX_GLOSSARY`	Client-specific definitions
Client scorecards	Condé Nast QA parameters	`CND_DB.SILVER.SCORECARD_GLOSSARY`	Client maintains these
Client dispositions	Keller Postman case types	`KLP_DB.SILVER.DISPOSITION_GLOSSARY`	Unique to that client
Employee rosters	Warby Parker agents	`CORE_DB.SILVER.ROSTER_WARBYPARKER`	Shared resource (XO manages)
Channel types	Chat, Email, Voice	`CORE_DB.SILVER.CHANNEL_TYPES`	Standardized across XO
XO roles	Agent, Lead, Manager	`CORE_DB.SILVER.XO_ROLE_TAXONOMY`	XO-defined

✅ Final Recommendation: Client-Specific Glossaries → Client Databases¶

Why this is best practice:

-- All Warby Parker data in one place
WBP_DB
├── BRONZE.GLADLY_CONTACT_TIMESTAMPS        -- Transactional data
├── SILVER.CONTACT_TIMESTAMPS               -- Historical data
├── SILVER.INBOX_GLOSSARY                   -- Reference data ← HERE
├── GOLD.CONTACTS                           -- Enriched data
└── GOLD.AGENT_KPIS_DAILY                   -- Metrics

2. Simplified permissions¶

-- Grant analyst access to all Warby data
GRANT USAGE ON DATABASE WBP_DB TO ROLE ANALYST_WBP;
GRANT USAGE ON ALL SCHEMAS IN DATABASE WBP_DB TO ROLE ANALYST_WBP;
GRANT SELECT ON ALL TABLES IN DATABASE WBP_DB TO ROLE ANALYST_WBP;

-- Don't need separate grants for CORE_DB glossaries

3. No cross-database joins in Gold layer¶

-- All JOINs stay within WBP_DB (cleaner, easier to understand)
SELECT
    contacts.*,
    inbox.INBOX_CATEGORY,         -- Same DB
    queue.QUEUE_TYPE,              -- Same DB
    roster.AGENT_EMAIL             -- Only roster comes from CORE_DB
FROM WBP_DB.SILVER.CONTACT_TIMESTAMPS contacts
LEFT JOIN WBP_DB.SILVER.INBOX_GLOSSARY inbox ON ...  -- Same DB
LEFT JOIN WBP_DB.SILVER.QUEUE_GLOSSARY queue ON ...  -- Same DB
LEFT JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER roster ON ...  -- Cross-DB (unavoidable, no perf impact)

4. Client lifecycle management¶

-- If Warby Parker contract ends, archive entire WBP_DB
-- Everything (data + glossaries) is self-contained
CREATE DATABASE WBP_DB_ARCHIVE CLONE WBP_DB;
DROP DATABASE WBP_DB;

5. Independent evolution¶

Warby Parker adds new inbox categories → only affects WBP_DB
Condé Nast changes disposition logic → only affects CND_DB
No risk of breaking other clients

Implementation Pattern¶

Bronze Layer (xo-foundry loads glossaries)¶

-- Source: Google Sheets or API
WBP_DB.BRONZE.GSPREAD_INBOX_GLOSSARY_RAW
WBP_DB.BRONZE.GSPREAD_QUEUE_GLOSSARY_RAW

Silver Layer (dbt cleans and validates)¶

-- models/silver/warbyparker/inbox_glossary.sql
{{
    config(
        materialized='table',  -- Rebuild weekly or on-demand
        schema='silver'
    )
}}

WITH bronze_source AS (
    SELECT * FROM {{ source('bronze_warbyparker', 'GSPREAD_INBOX_GLOSSARY_RAW') }}
),

-- Latest version only (or keep history if needed)
latest_glossary AS (
    SELECT *
    FROM bronze_source
    QUALIFY ROW_NUMBER() OVER (
        PARTITION BY INBOX_ID
        ORDER BY DATE_TO_WAREHOUSE DESC
    ) = 1
),

cleaned AS (
    SELECT
        CAST(INBOX_ID AS VARCHAR) AS INBOX_ID,
        CAST(INBOX_NAME AS VARCHAR) AS INBOX_NAME,
        CAST(INBOX_CATEGORY AS VARCHAR) AS INBOX_CATEGORY,
        CAST(INBOX_TYPE AS VARCHAR) AS INBOX_TYPE,
        DATE_TO_WAREHOUSE,
        CURRENT_TIMESTAMP() AS DBT_UPDATED_AT
    FROM latest_glossary
)

SELECT * FROM cleaned

Result:

WBP_DB.SILVER.INBOX_GLOSSARY
WBP_DB.SILVER.QUEUE_GLOSSARY
WBP_DB.SILVER.SCORECARD_GLOSSARY

Gold Layer (joins with glossaries - same database)¶

-- models/gold/warbyparker/contacts.sql
SELECT
    contacts.*,
    inbox.INBOX_CATEGORY,
    inbox.INBOX_TYPE,
    queue.QUEUE_CATEGORY,
    roster.AGENT_EMAIL  -- Only cross-DB join
FROM {{ ref('contact_timestamps') }} contacts           -- WBP_DB.SILVER
LEFT JOIN {{ ref('inbox_glossary') }} inbox             -- WBP_DB.SILVER ← Same DB!
    ON contacts.ASSIGNED_INBOX_ID = inbox.INBOX_ID
LEFT JOIN {{ ref('queue_glossary') }} queue             -- WBP_DB.SILVER ← Same DB!
    ON contacts.QUEUE_ID = queue.QUEUE_ID
LEFT JOIN {{ source('core_rosters', 'ROSTER_WARBYPARKER') }} roster  -- CORE_DB.SILVER ← Only cross-DB
    ON contacts.ASSIGNED_AGENT_NAME = roster.GLADLY_NAME

Exception: Shared Glossaries¶

If multiple clients share the same glossary structure, consider CORE_DB:

-- Example: Standard channel definitions used by all clients
CORE_DB.SILVER.CHANNEL_TYPES
  - CHANNEL_CODE (CHAT, EMAIL, VOICE)
  - CHANNEL_NAME (Chat, Email, Voice Call)
  - CHANNEL_CATEGORY (Digital, Traditional)

Then client glossaries reference it:

-- Client-specific queue mappings still in client DB
WBP_DB.SILVER.QUEUE_GLOSSARY
  - QUEUE_ID
  - QUEUE_NAME
  - CHANNEL_CODE → references CORE_DB.SILVER.CHANNEL_TYPES

Question 3: Gold Layer Naming Conventions¶

Challenge: Three types of tables in GOLD schema¶

Enriched entities (same grain as Silver, but with JOINs)
Aggregated KPIs (metrics tables)
Consumption views (Tableau, ad-hoc)

How to differentiate without prefixes?

❌ Option A: Prefixes (rejected - we said no prefixes)¶

GOLD.ENR_CONTACTS               -- Enriched
GOLD.KPI_AGENT_DAILY            -- KPIs
GOLD.VW_AGENT_DASHBOARD         -- Views

Problem: Back to prefixes (defeats the purpose of clean naming)

❌ Option B: Separate schemas (too complex)¶

GOLD_CORE.CONTACTS              -- Enriched
GOLD_METRICS.AGENT_KPIS_DAILY   -- KPIs
GOLD_VIEWS.AGENT_DASHBOARD      -- Views

Problems: - More schemas to manage - Tableau users confused by multiple "Gold" schemas - Harder to discover tables

✅ Option C: Naming Patterns + Materialization Type (RECOMMENDED)¶

Use clear naming patterns that inherently indicate the table type:

Pattern 1: Enriched Entities (plural nouns)¶

Naming: {OBJECT} (plural for facts, singular for dimensions)

Materialization: table or incremental

Examples:

GOLD.CONTACTS                   -- Contact-level facts (enriched)
GOLD.CONVERSATIONS              -- Conversation-level facts (enriched)
GOLD.MESSAGES                   -- Message-level facts (enriched)
GOLD.AGENTS                     -- Agent dimension (current state)

Pattern recognition: Plural noun = fact table with enrichment

Pattern 2: Aggregated KPIs (metric + grain)¶

Naming: {SUBJECT}_{METRIC_TYPE}_{GRAIN}

Materialization: table or incremental

Examples:

GOLD.AGENT_KPIS_DAILY           -- Agent KPIs aggregated by day
GOLD.AGENT_KPIS_HOURLY          -- Agent KPIs aggregated by hour
GOLD.CHANNEL_METRICS_DAILY      -- Channel metrics by day
GOLD.QUEUE_PERFORMANCE_DAILY    -- Queue performance by day
GOLD.EXECUTIVE_SUMMARY_WEEKLY   -- Executive metrics by week

Pattern recognition: *_KPIS_*, *_METRICS_*, *_PERFORMANCE_*, *_SUMMARY_* = aggregated

Pattern 3: Consumption Views (purpose-based)¶

Naming: {CONSUMER}_{PURPOSE} or {PURPOSE}_{CONTEXT}

Materialization: view (not table)

Examples:

GOLD.AGENT_DASHBOARD            -- Tableau agent dashboard
GOLD.AGENT_AD_HOC               -- Ad-hoc analyst queries
GOLD.EXECUTIVE_SUMMARY          -- Executive view
GOLD.MANAGER_SCORECARD          -- Manager scorecard
GOLD.QA_ANALYSIS                -- QA team analysis
GOLD.TABLEAU_MAIN               -- Main Tableau workbook

Pattern recognition: Action/purpose nouns = consumption views

Naming Convention Summary¶

Type	Pattern	Materialization	Examples
Enriched entities	`{OBJECT}` (plural)	`table` or `incremental`	`CONTACTS`, `CONVERSATIONS`, `MESSAGES`
Aggregated KPIs	`{SUBJECT}_{METRIC}_{GRAIN}`	`table` or `incremental`	`AGENT_KPIS_DAILY`, `CHANNEL_METRICS_HOURLY`
Consumption views	`{CONSUMER}_{PURPOSE}`	`view`	`AGENT_DASHBOARD`, `EXECUTIVE_SUMMARY`

How Users Distinguish Table Types¶

1. Naming pattern is self-documenting¶

-- Query enriched contacts (row per contact event)
SELECT * FROM WBP_DB.GOLD.CONTACTS WHERE KPI_DATE = '2026-01-05';

-- Query daily KPIs (row per agent per day)
SELECT * FROM WBP_DB.GOLD.AGENT_KPIS_DAILY WHERE KPI_DATE = '2026-01-05';

-- Query Tableau view (pre-joined with targets)
SELECT * FROM WBP_DB.GOLD.AGENT_DASHBOARD WHERE KPI_DATE = '2026-01-05';

Pattern is obvious from the name - no prefixes needed.

2. dbt documentation shows table type¶

# models/gold/warbyparker/schema.yml
models:
  - name: contacts
    description: "Enriched contact-level facts with employee and glossary metadata. Grain: one row per contact event."
    meta:
      layer: gold_core
      grain: contact_event
      materialization: table

  - name: agent_kpis_daily
    description: "Aggregated agent KPIs by day. Grain: one row per agent per day."
    meta:
      layer: gold_metrics
      grain: agent_day
      materialization: table

  - name: agent_dashboard
    description: "Tableau dashboard view with targets and variance calculations. Grain: one row per agent per day."
    meta:
      layer: gold_views
      grain: agent_day
      materialization: view

dbt docs site shows: - Table type (table vs view) - Grain (what each row represents) - Layer (core vs metrics vs views)

3. Snowflake metadata queries¶

-- Show all tables vs views in Gold
SHOW TABLES IN WBP_DB.GOLD;
SHOW VIEWS IN WBP_DB.GOLD;

-- Query metadata
SELECT
    TABLE_NAME,
    TABLE_TYPE,  -- BASE TABLE or VIEW
    ROW_COUNT,
    BYTES
FROM WBP_DB.INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'GOLD'
ORDER BY TABLE_NAME;

Example Gold Schema Organization¶

WBP_DB.GOLD
├── CONTACTS                    -- Enriched entities
├── CONVERSATIONS
├── MESSAGES
├── AGENTS
│
├── AGENT_KPIS_DAILY            -- Aggregated KPIs
├── AGENT_KPIS_HOURLY
├── CHANNEL_METRICS_DAILY
├── QUEUE_PERFORMANCE_DAILY
├── EXECUTIVE_SUMMARY_WEEKLY
│
└── AGENT_DASHBOARD             -- Consumption views
    AGENT_AD_HOC
    EXECUTIVE_SUMMARY
    MANAGER_SCORECARD
    QA_ANALYSIS

Clarity without prefixes: - Plural nouns → enriched entities - KPIS, METRICS → aggregated - Purpose-based names → views

Complete Recommendation Summary¶

✅ 1. Rosters → `CORE_DB.SILVER`¶

Location:

CORE_DB.SILVER.ROSTER_WARBYPARKER
CORE_DB.SILVER.ROSTER_CONDENAST

Performance: No impact (cross-database joins are fast in Snowflake)

Benefits: Centralized, source-agnostic, clean separation

✅ 2. Glossaries → Client Databases¶

Location:

WBP_DB.SILVER.INBOX_GLOSSARY
WBP_DB.SILVER.QUEUE_GLOSSARY
WBP_DB.SILVER.SCORECARD_GLOSSARY

CND_DB.SILVER.DISPOSITION_GLOSSARY
CND_DB.SILVER.TOPIC_GLOSSARY

Reasoning: Client-specific data stays with client, self-contained datasets, simpler permissions

Exception: Shared reference data (channel types, timezones) → CORE_DB.SILVER

✅ 3. Gold Layer Naming¶

Naming patterns (no prefixes):

Type	Pattern	Example
Enriched	`{OBJECT}` (plural)	`CONTACTS`, `CONVERSATIONS`
KPIs	`{SUBJECT}_{METRIC}_{GRAIN}`	`AGENT_KPIS_DAILY`
Views	`{CONSUMER}_{PURPOSE}`	`AGENT_DASHBOARD`

Differentiation: Naming pattern + dbt docs + materialization type

Next Steps¶

Implementation checklist:

Move rosters to CORE_DB.SILVER (if not already there)
Create glossary Bronze tables in client DBs (WBP_DB.BRONZE.*_GLOSSARY_RAW)
Create glossary Silver models in client DBs (WBP_DB.SILVER.*_GLOSSARY)
Update Gold models to use naming conventions (drop prefixes)
Document grain and layer in dbt schema.yml
Update Tableau connections to new view names

Questions remaining:

Which glossaries exist today per client? (need inventory)
Are glossaries currently in OPERATIONS database? (need migration plan)
What is the refresh cadence for each glossary? (weekly, on-demand, daily?)
Should we version glossaries (keep history) or latest only?

Performance Verification (Optional)¶

Want to verify cross-database join performance? Run this test:

-- Test 1: Same-database join (baseline)
SELECT COUNT(*)
FROM WBP_DB.SILVER.CONTACT_TIMESTAMPS c
JOIN WBP_DB.SILVER.CONVERSATIONS conv
    ON c.CONVERSATION_ID = conv.CONVERSATION_ID;

-- Test 2: Cross-database join (should be identical time)
SELECT COUNT(*)
FROM WBP_DB.SILVER.CONTACT_TIMESTAMPS c
JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER r
    ON c.ASSIGNED_AGENT_NAME = r.GLADLY_NAME;

-- Check query profile in Snowflake UI
-- Both queries should have similar execution time

Expected result: Execution times within 1-2% of each other.

Cross-Database Joins & Reference Data Placement¶

Question 1: Cross-Database Join Performance in Snowflake¶

TL;DR: Negligible performance impact¶

Why Cross-Database Joins Are Fast in Snowflake¶

1. Shared Metadata Layer¶

2. Query Optimizer Treats Them Identically¶

3. Micro-Partition Pruning Still Works¶

4. Result Cache is Shared¶

Benchmark Data¶

When Cross-Database Joins Might Matter¶

Official Snowflake Recommendation¶

Recommendation for Rosters¶

✅ Keep rosters in CORE_DB¶

Question 2: Glossary/Reference Table Placement¶

Analysis: CORE_DB vs Client Databases¶

Decision Framework¶

Type A: Client-Specific Glossaries → Client Database¶

Type B: Shared Reference Data → CORE_DB¶

Recommendation Matrix¶

✅ Final Recommendation: Client-Specific Glossaries → Client Databases¶

1. Cohesion: Keep related data together¶

2. Simplified permissions¶

3. No cross-database joins in Gold layer¶

4. Client lifecycle management¶

5. Independent evolution¶

Implementation Pattern¶

Bronze Layer (xo-foundry loads glossaries)¶

Silver Layer (dbt cleans and validates)¶

Gold Layer (joins with glossaries - same database)¶

Exception: Shared Glossaries¶

Question 3: Gold Layer Naming Conventions¶

Challenge: Three types of tables in GOLD schema¶

❌ Option A: Prefixes (rejected - we said no prefixes)¶

❌ Option B: Separate schemas (too complex)¶

✅ Option C: Naming Patterns + Materialization Type (RECOMMENDED)¶

Pattern 1: Enriched Entities (plural nouns)¶

Pattern 2: Aggregated KPIs (metric + grain)¶

Pattern 3: Consumption Views (purpose-based)¶

Naming Convention Summary¶

How Users Distinguish Table Types¶

1. Naming pattern is self-documenting¶

2. dbt documentation shows table type¶

3. Snowflake metadata queries¶

Example Gold Schema Organization¶

Complete Recommendation Summary¶

✅ 1. Rosters → CORE_DB.SILVER¶

✅ 2. Glossaries → Client Databases¶

✅ 3. Gold Layer Naming¶

Next Steps¶

Performance Verification (Optional)¶

✅ 1. Rosters → `CORE_DB.SILVER`¶