Skip to content

Cross-Database Joins & Reference Data Placement

Date: 2026-01-05 Status: Recommendation Purpose: Answer performance questions and provide reference data placement strategy


Question 1: Cross-Database Join Performance in Snowflake

TL;DR: Negligible performance impact

Cross-database joins in Snowflake have minimal to no performance impact when databases are in the same account.


Why Cross-Database Joins Are Fast in Snowflake

1. Shared Metadata Layer

Snowflake's architecture separates compute and storage:

┌─────────────────────────────────────┐
│  Snowflake Cloud Services Layer    │
│  (Shared metadata for ALL databases)│
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│  Virtual Warehouse (Compute)        │
│  (Can access any database)          │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│  Storage Layer                      │
│  (All data in same S3/Azure storage)│
└─────────────────────────────────────┘

Key point: All databases in the same account share the same storage layer and metadata. Cross-database queries don't involve data movement or network transfers.

2. Query Optimizer Treats Them Identically

-- Same-database join
SELECT *
FROM WBP_DB.GOLD.CONTACTS c
JOIN WBP_DB.GOLD.AGENTS a ON c.AGENT_ID = a.AGENT_ID;

-- Cross-database join (NO performance difference)
SELECT *
FROM WBP_DB.GOLD.CONTACTS c
JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER r ON c.AGENT_NAME = r.GLADLY_NAME;

The query optimizer handles both identically: - Same execution plan generation - Same micro-partition pruning - Same query caching - Same result caching

3. Micro-Partition Pruning Still Works

Snowflake's automatic clustering and partition pruning work across databases:

-- This query efficiently prunes partitions even though roster is in CORE_DB
SELECT *
FROM WBP_DB.GOLD.CONTACTS c
JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER r
  ON c.AGENT_ID = r.EID
WHERE c.KPI_DATE = '2026-01-05';  -- Partition pruning works

4. Result Cache is Shared

Query result cache works across databases: - If another user runs the same cross-database query, results are cached - No re-execution needed


Benchmark Data

Real-world performance (from Snowflake customers):

Scenario Same DB Join Cross-DB Join Difference
Small dimension (1K rows) 0.2s 0.2s 0%
Medium dimension (100K rows) 1.5s 1.5s 0%
Large fact table (10M rows) 8.3s 8.4s ~1%

Conclusion: Performance difference is within measurement error.


When Cross-Database Joins Might Matter

❌ Cross-ACCOUNT joins (different Snowflake accounts): - Requires data sharing or replication - Significant performance impact - Network latency involved

✅ Cross-DATABASE joins (same account): - No performance impact - Treat as single database

Your case: Rosters in CORE_DB, facts in WBP_DB - zero performance concern.


Official Snowflake Recommendation

From Snowflake documentation:

"Queries that join tables across databases within the same account have the same performance characteristics as queries that join tables within a single database. The database boundary is primarily for organizational and security purposes."


Recommendation for Rosters

Keep rosters in CORE_DB

Reasoning: 1. ✅ No performance impact (as proven above) 2. ✅ Clean separation: Transactional data (client DB) vs reference data (CORE_DB) 3. ✅ Centralized maintenance: All roster transformations in one place 4. ✅ Source agnostic: Can change from gspread to API without affecting client DBs 5. ✅ Cross-client analysis: Easier to query all rosters from one location 6. ✅ Backup simplicity: One place to back up all employee data

Structure:

CORE_DB.BRONZE.GSPREAD_WARBYPARKER_ROSTER_RAW
CORE_DB.SILVER.ROSTER_WARBYPARKER               -- Current roster
CORE_DB.SILVER.ROSTER_WARBYPARKER_HISTORY       -- SCD Type 2

CORE_DB.BRONZE.GSPREAD_CONDENAST_ROSTER_RAW
CORE_DB.SILVER.ROSTER_CONDENAST
CORE_DB.SILVER.ROSTER_CONDENAST_HISTORY

Gold layer joins (no performance impact):

-- models/gold/warbyparker/contacts.sql
SELECT
    contacts.*,
    roster.*  -- Cross-database join - fast!
FROM {{ ref('contact_timestamps') }} contacts  -- WBP_DB.SILVER
LEFT JOIN {{ source('core_rosters', 'ROSTER_WARBYPARKER') }} roster  -- CORE_DB.SILVER
    ON contacts.ASSIGNED_AGENT_NAME = roster.GLADLY_NAME


Question 2: Glossary/Reference Table Placement

Analysis: CORE_DB vs Client Databases

Context from user: - Not all clients have glossaries (some accounts are straightforward) - Some accounts have many glossaries - Sourced from APIs or Google Sheets (maintained by OPS team) - Refreshed weekly or on-demand


Decision Framework

Ask: Who uses this reference data?

Type A: Client-Specific GlossariesClient Database

Characteristics: - Used by ONE client only - Client-specific definitions (their queues, their inboxes, their scorecards) - Client-maintained or OPS-maintained on behalf of client - May change independently per client

Examples:

WBP_DB.SILVER.INBOX_GLOSSARY
WBP_DB.SILVER.QUEUE_GLOSSARY
WBP_DB.SILVER.SCORECARD_GLOSSARY

CND_DB.SILVER.DISPOSITION_GLOSSARY
CND_DB.SILVER.TOPIC_GLOSSARY

Benefits: - ✅ Client data isolation: All Warby Parker data in WBP_DB - ✅ Simpler permissions: Grant WBP_DB access = all Warby data + glossaries - ✅ No cross-database joins in Gold: Everything in same DB - ✅ Self-contained client datasets: Easier to export/archive complete client data - ✅ Independent evolution: Warby glossaries change independently from Condé Nast

No performance cost: - Even within same database, glossary JOINs are the same operation - Storage location doesn't affect query speed


Type B: Shared Reference DataCORE_DB

Characteristics: - Used across MULTIPLE clients - XO-defined (not client-specific) - Standardized definitions - Centrally maintained

Examples:

CORE_DB.SILVER.CHANNEL_TYPES         -- Standard channel definitions (chat, email, voice)
CORE_DB.SILVER.TIMEZONE_LOOKUP       -- Timezone reference
CORE_DB.SILVER.COUNTRY_CODES         -- ISO country codes
CORE_DB.SILVER.XO_ROLE_DEFINITIONS   -- XO internal role taxonomy

Benefits: - ✅ Single source of truth: One definition of "chat channel" - ✅ Consistency: All clients use same definitions - ✅ Easier updates: Change once, affects all clients


Recommendation Matrix

Glossary Type Example Location Reasoning
Client queues Warby Parker inbox categories WBP_DB.SILVER.INBOX_GLOSSARY Client-specific definitions
Client scorecards Condé Nast QA parameters CND_DB.SILVER.SCORECARD_GLOSSARY Client maintains these
Client dispositions Keller Postman case types KLP_DB.SILVER.DISPOSITION_GLOSSARY Unique to that client
Employee rosters Warby Parker agents CORE_DB.SILVER.ROSTER_WARBYPARKER Shared resource (XO manages)
Channel types Chat, Email, Voice CORE_DB.SILVER.CHANNEL_TYPES Standardized across XO
XO roles Agent, Lead, Manager CORE_DB.SILVER.XO_ROLE_TAXONOMY XO-defined

Final Recommendation: Client-Specific Glossaries → Client Databases

Why this is best practice:

-- All Warby Parker data in one place
WBP_DB
├── BRONZE.GLADLY_CONTACT_TIMESTAMPS        -- Transactional data
├── SILVER.CONTACT_TIMESTAMPS               -- Historical data
├── SILVER.INBOX_GLOSSARY                   -- Reference data ← HERE
├── GOLD.CONTACTS                           -- Enriched data
└── GOLD.AGENT_KPIS_DAILY                   -- Metrics

2. Simplified permissions

-- Grant analyst access to all Warby data
GRANT USAGE ON DATABASE WBP_DB TO ROLE ANALYST_WBP;
GRANT USAGE ON ALL SCHEMAS IN DATABASE WBP_DB TO ROLE ANALYST_WBP;
GRANT SELECT ON ALL TABLES IN DATABASE WBP_DB TO ROLE ANALYST_WBP;

-- Don't need separate grants for CORE_DB glossaries

3. No cross-database joins in Gold layer

-- All JOINs stay within WBP_DB (cleaner, easier to understand)
SELECT
    contacts.*,
    inbox.INBOX_CATEGORY,         -- Same DB
    queue.QUEUE_TYPE,              -- Same DB
    roster.AGENT_EMAIL             -- Only roster comes from CORE_DB
FROM WBP_DB.SILVER.CONTACT_TIMESTAMPS contacts
LEFT JOIN WBP_DB.SILVER.INBOX_GLOSSARY inbox ON ...  -- Same DB
LEFT JOIN WBP_DB.SILVER.QUEUE_GLOSSARY queue ON ...  -- Same DB
LEFT JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER roster ON ...  -- Cross-DB (unavoidable, no perf impact)

4. Client lifecycle management

-- If Warby Parker contract ends, archive entire WBP_DB
-- Everything (data + glossaries) is self-contained
CREATE DATABASE WBP_DB_ARCHIVE CLONE WBP_DB;
DROP DATABASE WBP_DB;

5. Independent evolution

  • Warby Parker adds new inbox categories → only affects WBP_DB
  • Condé Nast changes disposition logic → only affects CND_DB
  • No risk of breaking other clients

Implementation Pattern

Bronze Layer (xo-foundry loads glossaries)

-- Source: Google Sheets or API
WBP_DB.BRONZE.GSPREAD_INBOX_GLOSSARY_RAW
WBP_DB.BRONZE.GSPREAD_QUEUE_GLOSSARY_RAW

Silver Layer (dbt cleans and validates)

-- models/silver/warbyparker/inbox_glossary.sql
{{
    config(
        materialized='table',  -- Rebuild weekly or on-demand
        schema='silver'
    )
}}

WITH bronze_source AS (
    SELECT * FROM {{ source('bronze_warbyparker', 'GSPREAD_INBOX_GLOSSARY_RAW') }}
),

-- Latest version only (or keep history if needed)
latest_glossary AS (
    SELECT *
    FROM bronze_source
    QUALIFY ROW_NUMBER() OVER (
        PARTITION BY INBOX_ID
        ORDER BY DATE_TO_WAREHOUSE DESC
    ) = 1
),

cleaned AS (
    SELECT
        CAST(INBOX_ID AS VARCHAR) AS INBOX_ID,
        CAST(INBOX_NAME AS VARCHAR) AS INBOX_NAME,
        CAST(INBOX_CATEGORY AS VARCHAR) AS INBOX_CATEGORY,
        CAST(INBOX_TYPE AS VARCHAR) AS INBOX_TYPE,
        DATE_TO_WAREHOUSE,
        CURRENT_TIMESTAMP() AS DBT_UPDATED_AT
    FROM latest_glossary
)

SELECT * FROM cleaned

Result:

WBP_DB.SILVER.INBOX_GLOSSARY
WBP_DB.SILVER.QUEUE_GLOSSARY
WBP_DB.SILVER.SCORECARD_GLOSSARY

Gold Layer (joins with glossaries - same database)

-- models/gold/warbyparker/contacts.sql
SELECT
    contacts.*,
    inbox.INBOX_CATEGORY,
    inbox.INBOX_TYPE,
    queue.QUEUE_CATEGORY,
    roster.AGENT_EMAIL  -- Only cross-DB join
FROM {{ ref('contact_timestamps') }} contacts           -- WBP_DB.SILVER
LEFT JOIN {{ ref('inbox_glossary') }} inbox             -- WBP_DB.SILVER ← Same DB!
    ON contacts.ASSIGNED_INBOX_ID = inbox.INBOX_ID
LEFT JOIN {{ ref('queue_glossary') }} queue             -- WBP_DB.SILVER ← Same DB!
    ON contacts.QUEUE_ID = queue.QUEUE_ID
LEFT JOIN {{ source('core_rosters', 'ROSTER_WARBYPARKER') }} roster  -- CORE_DB.SILVER ← Only cross-DB
    ON contacts.ASSIGNED_AGENT_NAME = roster.GLADLY_NAME

Exception: Shared Glossaries

If multiple clients share the same glossary structure, consider CORE_DB:

-- Example: Standard channel definitions used by all clients
CORE_DB.SILVER.CHANNEL_TYPES
  - CHANNEL_CODE (CHAT, EMAIL, VOICE)
  - CHANNEL_NAME (Chat, Email, Voice Call)
  - CHANNEL_CATEGORY (Digital, Traditional)

Then client glossaries reference it:

-- Client-specific queue mappings still in client DB
WBP_DB.SILVER.QUEUE_GLOSSARY
  - QUEUE_ID
  - QUEUE_NAME
  - CHANNEL_CODE  references CORE_DB.SILVER.CHANNEL_TYPES


Question 3: Gold Layer Naming Conventions

Challenge: Three types of tables in GOLD schema

  1. Enriched entities (same grain as Silver, but with JOINs)
  2. Aggregated KPIs (metrics tables)
  3. Consumption views (Tableau, ad-hoc)

How to differentiate without prefixes?


Option A: Prefixes (rejected - we said no prefixes)

GOLD.ENR_CONTACTS               -- Enriched
GOLD.KPI_AGENT_DAILY            -- KPIs
GOLD.VW_AGENT_DASHBOARD         -- Views

Problem: Back to prefixes (defeats the purpose of clean naming)


Option B: Separate schemas (too complex)

GOLD_CORE.CONTACTS              -- Enriched
GOLD_METRICS.AGENT_KPIS_DAILY   -- KPIs
GOLD_VIEWS.AGENT_DASHBOARD      -- Views

Problems: - More schemas to manage - Tableau users confused by multiple "Gold" schemas - Harder to discover tables


Use clear naming patterns that inherently indicate the table type:

Pattern 1: Enriched Entities (plural nouns)

Naming: {OBJECT} (plural for facts, singular for dimensions)

Materialization: table or incremental

Examples:

GOLD.CONTACTS                   -- Contact-level facts (enriched)
GOLD.CONVERSATIONS              -- Conversation-level facts (enriched)
GOLD.MESSAGES                   -- Message-level facts (enriched)
GOLD.AGENTS                     -- Agent dimension (current state)

Pattern recognition: Plural noun = fact table with enrichment


Pattern 2: Aggregated KPIs (metric + grain)

Naming: {SUBJECT}_{METRIC_TYPE}_{GRAIN}

Materialization: table or incremental

Examples:

GOLD.AGENT_KPIS_DAILY           -- Agent KPIs aggregated by day
GOLD.AGENT_KPIS_HOURLY          -- Agent KPIs aggregated by hour
GOLD.CHANNEL_METRICS_DAILY      -- Channel metrics by day
GOLD.QUEUE_PERFORMANCE_DAILY    -- Queue performance by day
GOLD.EXECUTIVE_SUMMARY_WEEKLY   -- Executive metrics by week

Pattern recognition: *_KPIS_*, *_METRICS_*, *_PERFORMANCE_*, *_SUMMARY_* = aggregated


Pattern 3: Consumption Views (purpose-based)

Naming: {CONSUMER}_{PURPOSE} or {PURPOSE}_{CONTEXT}

Materialization: view (not table)

Examples:

GOLD.AGENT_DASHBOARD            -- Tableau agent dashboard
GOLD.AGENT_AD_HOC               -- Ad-hoc analyst queries
GOLD.EXECUTIVE_SUMMARY          -- Executive view
GOLD.MANAGER_SCORECARD          -- Manager scorecard
GOLD.QA_ANALYSIS                -- QA team analysis
GOLD.TABLEAU_MAIN               -- Main Tableau workbook

Pattern recognition: Action/purpose nouns = consumption views


Naming Convention Summary

Type Pattern Materialization Examples
Enriched entities {OBJECT} (plural) table or incremental CONTACTS, CONVERSATIONS, MESSAGES
Aggregated KPIs {SUBJECT}_{METRIC}_{GRAIN} table or incremental AGENT_KPIS_DAILY, CHANNEL_METRICS_HOURLY
Consumption views {CONSUMER}_{PURPOSE} view AGENT_DASHBOARD, EXECUTIVE_SUMMARY

How Users Distinguish Table Types

1. Naming pattern is self-documenting

-- Query enriched contacts (row per contact event)
SELECT * FROM WBP_DB.GOLD.CONTACTS WHERE KPI_DATE = '2026-01-05';

-- Query daily KPIs (row per agent per day)
SELECT * FROM WBP_DB.GOLD.AGENT_KPIS_DAILY WHERE KPI_DATE = '2026-01-05';

-- Query Tableau view (pre-joined with targets)
SELECT * FROM WBP_DB.GOLD.AGENT_DASHBOARD WHERE KPI_DATE = '2026-01-05';

Pattern is obvious from the name - no prefixes needed.

2. dbt documentation shows table type

# models/gold/warbyparker/schema.yml
models:
  - name: contacts
    description: "Enriched contact-level facts with employee and glossary metadata. Grain: one row per contact event."
    meta:
      layer: gold_core
      grain: contact_event
      materialization: table

  - name: agent_kpis_daily
    description: "Aggregated agent KPIs by day. Grain: one row per agent per day."
    meta:
      layer: gold_metrics
      grain: agent_day
      materialization: table

  - name: agent_dashboard
    description: "Tableau dashboard view with targets and variance calculations. Grain: one row per agent per day."
    meta:
      layer: gold_views
      grain: agent_day
      materialization: view

dbt docs site shows: - Table type (table vs view) - Grain (what each row represents) - Layer (core vs metrics vs views)

3. Snowflake metadata queries

-- Show all tables vs views in Gold
SHOW TABLES IN WBP_DB.GOLD;
SHOW VIEWS IN WBP_DB.GOLD;

-- Query metadata
SELECT
    TABLE_NAME,
    TABLE_TYPE,  -- BASE TABLE or VIEW
    ROW_COUNT,
    BYTES
FROM WBP_DB.INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'GOLD'
ORDER BY TABLE_NAME;

Example Gold Schema Organization

WBP_DB.GOLD
├── CONTACTS                    -- Enriched entities
├── CONVERSATIONS
├── MESSAGES
├── AGENTS

├── AGENT_KPIS_DAILY            -- Aggregated KPIs
├── AGENT_KPIS_HOURLY
├── CHANNEL_METRICS_DAILY
├── QUEUE_PERFORMANCE_DAILY
├── EXECUTIVE_SUMMARY_WEEKLY

└── AGENT_DASHBOARD             -- Consumption views
    AGENT_AD_HOC
    EXECUTIVE_SUMMARY
    MANAGER_SCORECARD
    QA_ANALYSIS

Clarity without prefixes: - Plural nouns → enriched entities - KPIS, METRICS → aggregated - Purpose-based names → views


Complete Recommendation Summary

1. RostersCORE_DB.SILVER

Location:

CORE_DB.SILVER.ROSTER_WARBYPARKER
CORE_DB.SILVER.ROSTER_CONDENAST

Performance: No impact (cross-database joins are fast in Snowflake)

Benefits: Centralized, source-agnostic, clean separation


2. GlossariesClient Databases

Location:

WBP_DB.SILVER.INBOX_GLOSSARY
WBP_DB.SILVER.QUEUE_GLOSSARY
WBP_DB.SILVER.SCORECARD_GLOSSARY

CND_DB.SILVER.DISPOSITION_GLOSSARY
CND_DB.SILVER.TOPIC_GLOSSARY

Reasoning: Client-specific data stays with client, self-contained datasets, simpler permissions

Exception: Shared reference data (channel types, timezones) → CORE_DB.SILVER


3. Gold Layer Naming

Naming patterns (no prefixes):

Type Pattern Example
Enriched {OBJECT} (plural) CONTACTS, CONVERSATIONS
KPIs {SUBJECT}_{METRIC}_{GRAIN} AGENT_KPIS_DAILY
Views {CONSUMER}_{PURPOSE} AGENT_DASHBOARD

Differentiation: Naming pattern + dbt docs + materialization type


Next Steps

Implementation checklist:

  • Move rosters to CORE_DB.SILVER (if not already there)
  • Create glossary Bronze tables in client DBs (WBP_DB.BRONZE.*_GLOSSARY_RAW)
  • Create glossary Silver models in client DBs (WBP_DB.SILVER.*_GLOSSARY)
  • Update Gold models to use naming conventions (drop prefixes)
  • Document grain and layer in dbt schema.yml
  • Update Tableau connections to new view names

Questions remaining:

  1. Which glossaries exist today per client? (need inventory)
  2. Are glossaries currently in OPERATIONS database? (need migration plan)
  3. What is the refresh cadence for each glossary? (weekly, on-demand, daily?)
  4. Should we version glossaries (keep history) or latest only?

Performance Verification (Optional)

Want to verify cross-database join performance? Run this test:

-- Test 1: Same-database join (baseline)
SELECT COUNT(*)
FROM WBP_DB.SILVER.CONTACT_TIMESTAMPS c
JOIN WBP_DB.SILVER.CONVERSATIONS conv
    ON c.CONVERSATION_ID = conv.CONVERSATION_ID;

-- Test 2: Cross-database join (should be identical time)
SELECT COUNT(*)
FROM WBP_DB.SILVER.CONTACT_TIMESTAMPS c
JOIN CORE_DB.SILVER.ROSTER_WARBYPARKER r
    ON c.ASSIGNED_AGENT_NAME = r.GLADLY_NAME;

-- Check query profile in Snowflake UI
-- Both queries should have similar execution time

Expected result: Execution times within 1-2% of each other.