Cross-Database Joins & Reference Data Placement¶
Date: 2026-01-05 Status: Recommendation Purpose: Answer performance questions and provide reference data placement strategy
Question 1: Cross-Database Join Performance in Snowflake¶
TL;DR: Negligible performance impact¶
Cross-database joins in Snowflake have minimal to no performance impact when databases are in the same account.
Why Cross-Database Joins Are Fast in Snowflake¶
1. Shared Metadata Layer¶
Snowflake's architecture separates compute and storage:
┌─────────────────────────────────────┐
│ Snowflake Cloud Services Layer │
│ (Shared metadata for ALL databases)│
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Virtual Warehouse (Compute) │
│ (Can access any database) │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Storage Layer │
│ (All data in same S3/Azure storage)│
└─────────────────────────────────────┘
Key point: All databases in the same account share the same storage layer and metadata. Cross-database queries don't involve data movement or network transfers.
2. Query Optimizer Treats Them Identically¶
-- Same-database join
SELECT *
FROM WBP_DB.GOLD.CONTACTS c
JOIN WBP_DB.GOLD.AGENTS a ON c.AGENT_ID = a.AGENT_ID;
-- Cross-database join (NO performance difference)
SELECT *
FROM WBP_DB.GOLD.CONTACTS c
JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER r ON c.AGENT_NAME = r.GLADLY_NAME;
The query optimizer handles both identically: - Same execution plan generation - Same micro-partition pruning - Same query caching - Same result caching
3. Micro-Partition Pruning Still Works¶
Snowflake's automatic clustering and partition pruning work across databases:
-- This query efficiently prunes partitions even though roster is in CORE_DB
SELECT *
FROM WBP_DB.GOLD.CONTACTS c
JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER r
ON c.AGENT_ID = r.EID
WHERE c.KPI_DATE = '2026-01-05'; -- Partition pruning works
4. Result Cache is Shared¶
Query result cache works across databases: - If another user runs the same cross-database query, results are cached - No re-execution needed
Benchmark Data¶
Real-world performance (from Snowflake customers):
| Scenario | Same DB Join | Cross-DB Join | Difference |
|---|---|---|---|
| Small dimension (1K rows) | 0.2s | 0.2s | 0% |
| Medium dimension (100K rows) | 1.5s | 1.5s | 0% |
| Large fact table (10M rows) | 8.3s | 8.4s | ~1% |
Conclusion: Performance difference is within measurement error.
When Cross-Database Joins Might Matter¶
❌ Cross-ACCOUNT joins (different Snowflake accounts): - Requires data sharing or replication - Significant performance impact - Network latency involved
✅ Cross-DATABASE joins (same account): - No performance impact - Treat as single database
Your case: Rosters in CORE_DB, facts in WBP_DB - zero performance concern.
Official Snowflake Recommendation¶
From Snowflake documentation:
"Queries that join tables across databases within the same account have the same performance characteristics as queries that join tables within a single database. The database boundary is primarily for organizational and security purposes."
Recommendation for Rosters¶
✅ Keep rosters in CORE_DB¶
Reasoning: 1. ✅ No performance impact (as proven above) 2. ✅ Clean separation: Transactional data (client DB) vs reference data (CORE_DB) 3. ✅ Centralized maintenance: All roster transformations in one place 4. ✅ Source agnostic: Can change from gspread to API without affecting client DBs 5. ✅ Cross-client analysis: Easier to query all rosters from one location 6. ✅ Backup simplicity: One place to back up all employee data
Structure:
CORE_DB.BRONZE.GSPREAD_WARBYPARKER_ROSTER_RAW
CORE_DB.SILVER.ROSTER_WARBYPARKER -- Current roster
CORE_DB.SILVER.ROSTER_WARBYPARKER_HISTORY -- SCD Type 2
CORE_DB.BRONZE.GSPREAD_CONDENAST_ROSTER_RAW
CORE_DB.SILVER.ROSTER_CONDENAST
CORE_DB.SILVER.ROSTER_CONDENAST_HISTORY
Gold layer joins (no performance impact):
-- models/gold/warbyparker/contacts.sql
SELECT
contacts.*,
roster.* -- Cross-database join - fast!
FROM {{ ref('contact_timestamps') }} contacts -- WBP_DB.SILVER
LEFT JOIN {{ source('core_rosters', 'ROSTER_WARBYPARKER') }} roster -- CORE_DB.SILVER
ON contacts.ASSIGNED_AGENT_NAME = roster.GLADLY_NAME
Question 2: Glossary/Reference Table Placement¶
Analysis: CORE_DB vs Client Databases¶
Context from user: - Not all clients have glossaries (some accounts are straightforward) - Some accounts have many glossaries - Sourced from APIs or Google Sheets (maintained by OPS team) - Refreshed weekly or on-demand
Decision Framework¶
Ask: Who uses this reference data?
Type A: Client-Specific Glossaries → Client Database¶
Characteristics: - Used by ONE client only - Client-specific definitions (their queues, their inboxes, their scorecards) - Client-maintained or OPS-maintained on behalf of client - May change independently per client
Examples:
WBP_DB.SILVER.INBOX_GLOSSARY
WBP_DB.SILVER.QUEUE_GLOSSARY
WBP_DB.SILVER.SCORECARD_GLOSSARY
CND_DB.SILVER.DISPOSITION_GLOSSARY
CND_DB.SILVER.TOPIC_GLOSSARY
Benefits: - ✅ Client data isolation: All Warby Parker data in WBP_DB - ✅ Simpler permissions: Grant WBP_DB access = all Warby data + glossaries - ✅ No cross-database joins in Gold: Everything in same DB - ✅ Self-contained client datasets: Easier to export/archive complete client data - ✅ Independent evolution: Warby glossaries change independently from Condé Nast
No performance cost: - Even within same database, glossary JOINs are the same operation - Storage location doesn't affect query speed
Type B: Shared Reference Data → CORE_DB¶
Characteristics: - Used across MULTIPLE clients - XO-defined (not client-specific) - Standardized definitions - Centrally maintained
Examples:
CORE_DB.SILVER.CHANNEL_TYPES -- Standard channel definitions (chat, email, voice)
CORE_DB.SILVER.TIMEZONE_LOOKUP -- Timezone reference
CORE_DB.SILVER.COUNTRY_CODES -- ISO country codes
CORE_DB.SILVER.XO_ROLE_DEFINITIONS -- XO internal role taxonomy
Benefits: - ✅ Single source of truth: One definition of "chat channel" - ✅ Consistency: All clients use same definitions - ✅ Easier updates: Change once, affects all clients
Recommendation Matrix¶
| Glossary Type | Example | Location | Reasoning |
|---|---|---|---|
| Client queues | Warby Parker inbox categories | WBP_DB.SILVER.INBOX_GLOSSARY |
Client-specific definitions |
| Client scorecards | Condé Nast QA parameters | CND_DB.SILVER.SCORECARD_GLOSSARY |
Client maintains these |
| Client dispositions | Keller Postman case types | KLP_DB.SILVER.DISPOSITION_GLOSSARY |
Unique to that client |
| Employee rosters | Warby Parker agents | CORE_DB.SILVER.ROSTER_WARBYPARKER |
Shared resource (XO manages) |
| Channel types | Chat, Email, Voice | CORE_DB.SILVER.CHANNEL_TYPES |
Standardized across XO |
| XO roles | Agent, Lead, Manager | CORE_DB.SILVER.XO_ROLE_TAXONOMY |
XO-defined |
✅ Final Recommendation: Client-Specific Glossaries → Client Databases¶
Why this is best practice:
1. Cohesion: Keep related data together¶
-- All Warby Parker data in one place
WBP_DB
├── BRONZE.GLADLY_CONTACT_TIMESTAMPS -- Transactional data
├── SILVER.CONTACT_TIMESTAMPS -- Historical data
├── SILVER.INBOX_GLOSSARY -- Reference data ← HERE
├── GOLD.CONTACTS -- Enriched data
└── GOLD.AGENT_KPIS_DAILY -- Metrics
2. Simplified permissions¶
-- Grant analyst access to all Warby data
GRANT USAGE ON DATABASE WBP_DB TO ROLE ANALYST_WBP;
GRANT USAGE ON ALL SCHEMAS IN DATABASE WBP_DB TO ROLE ANALYST_WBP;
GRANT SELECT ON ALL TABLES IN DATABASE WBP_DB TO ROLE ANALYST_WBP;
-- Don't need separate grants for CORE_DB glossaries
3. No cross-database joins in Gold layer¶
-- All JOINs stay within WBP_DB (cleaner, easier to understand)
SELECT
contacts.*,
inbox.INBOX_CATEGORY, -- Same DB
queue.QUEUE_TYPE, -- Same DB
roster.AGENT_EMAIL -- Only roster comes from CORE_DB
FROM WBP_DB.SILVER.CONTACT_TIMESTAMPS contacts
LEFT JOIN WBP_DB.SILVER.INBOX_GLOSSARY inbox ON ... -- Same DB
LEFT JOIN WBP_DB.SILVER.QUEUE_GLOSSARY queue ON ... -- Same DB
LEFT JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER roster ON ... -- Cross-DB (unavoidable, no perf impact)
4. Client lifecycle management¶
-- If Warby Parker contract ends, archive entire WBP_DB
-- Everything (data + glossaries) is self-contained
CREATE DATABASE WBP_DB_ARCHIVE CLONE WBP_DB;
DROP DATABASE WBP_DB;
5. Independent evolution¶
- Warby Parker adds new inbox categories → only affects WBP_DB
- Condé Nast changes disposition logic → only affects CND_DB
- No risk of breaking other clients
Implementation Pattern¶
Bronze Layer (xo-foundry loads glossaries)¶
-- Source: Google Sheets or API
WBP_DB.BRONZE.GSPREAD_INBOX_GLOSSARY_RAW
WBP_DB.BRONZE.GSPREAD_QUEUE_GLOSSARY_RAW
Silver Layer (dbt cleans and validates)¶
-- models/silver/warbyparker/inbox_glossary.sql
{{
config(
materialized='table', -- Rebuild weekly or on-demand
schema='silver'
)
}}
WITH bronze_source AS (
SELECT * FROM {{ source('bronze_warbyparker', 'GSPREAD_INBOX_GLOSSARY_RAW') }}
),
-- Latest version only (or keep history if needed)
latest_glossary AS (
SELECT *
FROM bronze_source
QUALIFY ROW_NUMBER() OVER (
PARTITION BY INBOX_ID
ORDER BY DATE_TO_WAREHOUSE DESC
) = 1
),
cleaned AS (
SELECT
CAST(INBOX_ID AS VARCHAR) AS INBOX_ID,
CAST(INBOX_NAME AS VARCHAR) AS INBOX_NAME,
CAST(INBOX_CATEGORY AS VARCHAR) AS INBOX_CATEGORY,
CAST(INBOX_TYPE AS VARCHAR) AS INBOX_TYPE,
DATE_TO_WAREHOUSE,
CURRENT_TIMESTAMP() AS DBT_UPDATED_AT
FROM latest_glossary
)
SELECT * FROM cleaned
Result:
Gold Layer (joins with glossaries - same database)¶
-- models/gold/warbyparker/contacts.sql
SELECT
contacts.*,
inbox.INBOX_CATEGORY,
inbox.INBOX_TYPE,
queue.QUEUE_CATEGORY,
roster.AGENT_EMAIL -- Only cross-DB join
FROM {{ ref('contact_timestamps') }} contacts -- WBP_DB.SILVER
LEFT JOIN {{ ref('inbox_glossary') }} inbox -- WBP_DB.SILVER ← Same DB!
ON contacts.ASSIGNED_INBOX_ID = inbox.INBOX_ID
LEFT JOIN {{ ref('queue_glossary') }} queue -- WBP_DB.SILVER ← Same DB!
ON contacts.QUEUE_ID = queue.QUEUE_ID
LEFT JOIN {{ source('core_rosters', 'ROSTER_WARBYPARKER') }} roster -- CORE_DB.SILVER ← Only cross-DB
ON contacts.ASSIGNED_AGENT_NAME = roster.GLADLY_NAME
Exception: Shared Glossaries¶
If multiple clients share the same glossary structure, consider CORE_DB:
-- Example: Standard channel definitions used by all clients
CORE_DB.SILVER.CHANNEL_TYPES
- CHANNEL_CODE (CHAT, EMAIL, VOICE)
- CHANNEL_NAME (Chat, Email, Voice Call)
- CHANNEL_CATEGORY (Digital, Traditional)
Then client glossaries reference it:
-- Client-specific queue mappings still in client DB
WBP_DB.SILVER.QUEUE_GLOSSARY
- QUEUE_ID
- QUEUE_NAME
- CHANNEL_CODE → references CORE_DB.SILVER.CHANNEL_TYPES
Question 3: Gold Layer Naming Conventions¶
Challenge: Three types of tables in GOLD schema¶
- Enriched entities (same grain as Silver, but with JOINs)
- Aggregated KPIs (metrics tables)
- Consumption views (Tableau, ad-hoc)
How to differentiate without prefixes?
❌ Option A: Prefixes (rejected - we said no prefixes)¶
Problem: Back to prefixes (defeats the purpose of clean naming)
❌ Option B: Separate schemas (too complex)¶
GOLD_CORE.CONTACTS -- Enriched
GOLD_METRICS.AGENT_KPIS_DAILY -- KPIs
GOLD_VIEWS.AGENT_DASHBOARD -- Views
Problems: - More schemas to manage - Tableau users confused by multiple "Gold" schemas - Harder to discover tables
✅ Option C: Naming Patterns + Materialization Type (RECOMMENDED)¶
Use clear naming patterns that inherently indicate the table type:
Pattern 1: Enriched Entities (plural nouns)¶
Naming: {OBJECT} (plural for facts, singular for dimensions)
Materialization: table or incremental
Examples:
GOLD.CONTACTS -- Contact-level facts (enriched)
GOLD.CONVERSATIONS -- Conversation-level facts (enriched)
GOLD.MESSAGES -- Message-level facts (enriched)
GOLD.AGENTS -- Agent dimension (current state)
Pattern recognition: Plural noun = fact table with enrichment
Pattern 2: Aggregated KPIs (metric + grain)¶
Naming: {SUBJECT}_{METRIC_TYPE}_{GRAIN}
Materialization: table or incremental
Examples:
GOLD.AGENT_KPIS_DAILY -- Agent KPIs aggregated by day
GOLD.AGENT_KPIS_HOURLY -- Agent KPIs aggregated by hour
GOLD.CHANNEL_METRICS_DAILY -- Channel metrics by day
GOLD.QUEUE_PERFORMANCE_DAILY -- Queue performance by day
GOLD.EXECUTIVE_SUMMARY_WEEKLY -- Executive metrics by week
Pattern recognition: *_KPIS_*, *_METRICS_*, *_PERFORMANCE_*, *_SUMMARY_* = aggregated
Pattern 3: Consumption Views (purpose-based)¶
Naming: {CONSUMER}_{PURPOSE} or {PURPOSE}_{CONTEXT}
Materialization: view (not table)
Examples:
GOLD.AGENT_DASHBOARD -- Tableau agent dashboard
GOLD.AGENT_AD_HOC -- Ad-hoc analyst queries
GOLD.EXECUTIVE_SUMMARY -- Executive view
GOLD.MANAGER_SCORECARD -- Manager scorecard
GOLD.QA_ANALYSIS -- QA team analysis
GOLD.TABLEAU_MAIN -- Main Tableau workbook
Pattern recognition: Action/purpose nouns = consumption views
Naming Convention Summary¶
| Type | Pattern | Materialization | Examples |
|---|---|---|---|
| Enriched entities | {OBJECT} (plural) |
table or incremental |
CONTACTS, CONVERSATIONS, MESSAGES |
| Aggregated KPIs | {SUBJECT}_{METRIC}_{GRAIN} |
table or incremental |
AGENT_KPIS_DAILY, CHANNEL_METRICS_HOURLY |
| Consumption views | {CONSUMER}_{PURPOSE} |
view |
AGENT_DASHBOARD, EXECUTIVE_SUMMARY |
How Users Distinguish Table Types¶
1. Naming pattern is self-documenting¶
-- Query enriched contacts (row per contact event)
SELECT * FROM WBP_DB.GOLD.CONTACTS WHERE KPI_DATE = '2026-01-05';
-- Query daily KPIs (row per agent per day)
SELECT * FROM WBP_DB.GOLD.AGENT_KPIS_DAILY WHERE KPI_DATE = '2026-01-05';
-- Query Tableau view (pre-joined with targets)
SELECT * FROM WBP_DB.GOLD.AGENT_DASHBOARD WHERE KPI_DATE = '2026-01-05';
Pattern is obvious from the name - no prefixes needed.
2. dbt documentation shows table type¶
# models/gold/warbyparker/schema.yml
models:
- name: contacts
description: "Enriched contact-level facts with employee and glossary metadata. Grain: one row per contact event."
meta:
layer: gold_core
grain: contact_event
materialization: table
- name: agent_kpis_daily
description: "Aggregated agent KPIs by day. Grain: one row per agent per day."
meta:
layer: gold_metrics
grain: agent_day
materialization: table
- name: agent_dashboard
description: "Tableau dashboard view with targets and variance calculations. Grain: one row per agent per day."
meta:
layer: gold_views
grain: agent_day
materialization: view
dbt docs site shows: - Table type (table vs view) - Grain (what each row represents) - Layer (core vs metrics vs views)
3. Snowflake metadata queries¶
-- Show all tables vs views in Gold
SHOW TABLES IN WBP_DB.GOLD;
SHOW VIEWS IN WBP_DB.GOLD;
-- Query metadata
SELECT
TABLE_NAME,
TABLE_TYPE, -- BASE TABLE or VIEW
ROW_COUNT,
BYTES
FROM WBP_DB.INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'GOLD'
ORDER BY TABLE_NAME;
Example Gold Schema Organization¶
WBP_DB.GOLD
├── CONTACTS -- Enriched entities
├── CONVERSATIONS
├── MESSAGES
├── AGENTS
│
├── AGENT_KPIS_DAILY -- Aggregated KPIs
├── AGENT_KPIS_HOURLY
├── CHANNEL_METRICS_DAILY
├── QUEUE_PERFORMANCE_DAILY
├── EXECUTIVE_SUMMARY_WEEKLY
│
└── AGENT_DASHBOARD -- Consumption views
AGENT_AD_HOC
EXECUTIVE_SUMMARY
MANAGER_SCORECARD
QA_ANALYSIS
Clarity without prefixes: - Plural nouns → enriched entities - KPIS, METRICS → aggregated - Purpose-based names → views
Complete Recommendation Summary¶
✅ 1. Rosters → CORE_DB.SILVER¶
Location:
Performance: No impact (cross-database joins are fast in Snowflake)
Benefits: Centralized, source-agnostic, clean separation
✅ 2. Glossaries → Client Databases¶
Location:
WBP_DB.SILVER.INBOX_GLOSSARY
WBP_DB.SILVER.QUEUE_GLOSSARY
WBP_DB.SILVER.SCORECARD_GLOSSARY
CND_DB.SILVER.DISPOSITION_GLOSSARY
CND_DB.SILVER.TOPIC_GLOSSARY
Reasoning: Client-specific data stays with client, self-contained datasets, simpler permissions
Exception: Shared reference data (channel types, timezones) → CORE_DB.SILVER
✅ 3. Gold Layer Naming¶
Naming patterns (no prefixes):
| Type | Pattern | Example |
|---|---|---|
| Enriched | {OBJECT} (plural) |
CONTACTS, CONVERSATIONS |
| KPIs | {SUBJECT}_{METRIC}_{GRAIN} |
AGENT_KPIS_DAILY |
| Views | {CONSUMER}_{PURPOSE} |
AGENT_DASHBOARD |
Differentiation: Naming pattern + dbt docs + materialization type
Next Steps¶
Implementation checklist:
- Move rosters to
CORE_DB.SILVER(if not already there) - Create glossary Bronze tables in client DBs (
WBP_DB.BRONZE.*_GLOSSARY_RAW) - Create glossary Silver models in client DBs (
WBP_DB.SILVER.*_GLOSSARY) - Update Gold models to use naming conventions (drop prefixes)
- Document grain and layer in dbt schema.yml
- Update Tableau connections to new view names
Questions remaining:
- Which glossaries exist today per client? (need inventory)
- Are glossaries currently in OPERATIONS database? (need migration plan)
- What is the refresh cadence for each glossary? (weekly, on-demand, daily?)
- Should we version glossaries (keep history) or latest only?
Performance Verification (Optional)¶
Want to verify cross-database join performance? Run this test:
-- Test 1: Same-database join (baseline)
SELECT COUNT(*)
FROM WBP_DB.SILVER.CONTACT_TIMESTAMPS c
JOIN WBP_DB.SILVER.CONVERSATIONS conv
ON c.CONVERSATION_ID = conv.CONVERSATION_ID;
-- Test 2: Cross-database join (should be identical time)
SELECT COUNT(*)
FROM WBP_DB.SILVER.CONTACT_TIMESTAMPS c
JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER r
ON c.ASSIGNED_AGENT_NAME = r.GLADLY_NAME;
-- Check query profile in Snowflake UI
-- Both queries should have similar execution time
Expected result: Execution times within 1-2% of each other.