ADR 008: Gold Layer Architecture - Facts, Dimensions, Aggregates, Reports¶
Date: 2026-01-05 Status: ✅ Accepted Deciders: Data Engineering Team
Context¶
With Bronze and Silver layers established, we need to define the Gold layer architecture for enriched, aggregated, and consumption-ready data.
Key challenges: 1. Multiple types of models in Gold (enriched facts, dimensions, aggregates, reports) 2. Need clear separation between building blocks (reusable) and consumption layer (end-user facing) 3. Performance optimization (when to materialize vs use views) 4. Naming conventions that indicate model type without redundancy
Decision¶
Implement a four-type Gold layer architecture:
1. Facts (fct_ prefix)¶
- Purpose: Enriched granular data (same grain as Silver, but with JOINs)
- Materialization:
tableorincremental - Characteristics:
- Same grain as Silver (row per event/transaction)
- JOIN with employee rosters (CORE_DB)
- JOIN with glossaries (client DB or CORE_DB)
- Filter to relevant records (e.g., XO agents only)
- Source of Truth for ad-hoc analysis
Example:
GOLD.FCT_CONTACTS
- Contact-level facts
- All Silver columns + employee metadata + glossary metadata
- Filtered to XO agents
- Materialized as table
2. Dimensions (dim_ prefix)¶
- Purpose: Current state entities and reference data
- Materialization:
tableorincremental(SCD Type 1 or Type 2) - Characteristics:
- Current or historical state of entities
- Slowly changing dimensions
- Reference data for joins
Example:
3. Aggregates (agg_ prefix)¶
- Purpose: Pre-aggregated base metrics (building blocks for reports)
- Materialization:
tableorincremental - Characteristics:
- Single grain (agent-day, channel-hour, etc.)
- Standard KPI calculations
- Reusable across multiple reports
- Few or no JOINs
- Pre-calculated for performance
Example:
GOLD.AGG_AGENT_DAILY
- Row per agent per day
- Pre-aggregated from fct_contacts
- Standard KPIs + client-specific KPIs
- Used by multiple report views
4. Reports (rpt_ prefix)¶
- Purpose: End-user consumption layer (Tableau, analysts)
- Materialization:
view(virtual, zero storage) - Characteristics:
- JOINs aggregates with dimensions and targets
- Calculated fields (variance, performance flags)
- Tableau-optimized (all needed columns)
- Directly consumed by end users
- Always fresh (computed on query)
Example:
GOLD.RPT_AGENT_DASHBOARD
- Tableau agent dashboard
- JOINs: agg_agent_daily + dim_agent + targets
- Calculated: variance, performance flags
- View (no storage cost)
Rationale¶
Why separate Facts and Aggregates?¶
Facts (grain-level): - Needed for ad-hoc deep-dive analysis - Power users query these for custom analysis - Too slow to aggregate on-the-fly (millions of rows)
Aggregates (rolled-up): - Pre-calculated for dashboard performance - Reusable across multiple reports - Smaller data size (faster queries)
Why separate Aggregates and Reports?¶
Aggregates (building blocks): - Reusable base metrics - Single grain, minimal JOINs - Source data for multiple reports
Reports (consumption): - Customized per consumer (Tableau, analysts) - JOINs with current rosters/targets (always fresh) - Calculated fields (business logic) - Zero storage cost (views)
Why use views for Reports?¶
- ✅ Always fresh (roster and target changes immediately reflected)
- ✅ Zero storage cost
- ✅ Easy to modify (change view, all dashboards updated)
- ✅ Centralized business logic (not in Tableau)
Performance: Reports query pre-aggregated tables (fast), not raw facts.
Consequences¶
Positive¶
✅ Clear separation of concerns: - Facts = SOT for analysts - Aggregates = Building blocks - Reports = End-user consumption
✅ Performance optimization: - Pre-aggregate expensive operations - Views are fast (query pre-aggregated data)
✅ Cost efficiency: - Store only facts and aggregates - Reports are views (zero storage)
✅ Flexibility: - Modify reports without reprocessing data - Business logic in dbt (version controlled)
✅ Reusability: - One aggregate table → many report views
Negative¶
⚠️ More layers to understand: - Need to train users on which to query (facts vs agg vs reports) - Solution: Clear naming + documentation
⚠️ View performance depends on aggregate freshness: - If aggregate is stale, report shows stale data - Solution: Refresh aggregates frequently
Implementation¶
Data Flow:¶
BRONZE (xo-foundry)
└─ Raw landing (truncated daily)
↓
SILVER (dbt)
└─ Historical preservation (no enrichment)
↓
GOLD FACTS (dbt)
└─ Enriched entities (employee + glossary + filtering)
↓
GOLD AGGREGATES (dbt)
└─ Pre-aggregated metrics (building blocks)
↓
GOLD REPORTS (dbt)
└─ Consumption views (Tableau-ready)
Example for Warby Parker:¶
-- 1. FACT (enriched granular data)
GOLD.FCT_CONTACTS
- Materialized: table
- Grain: row per contact event
- Enrichment: employee + glossary metadata
- Filter: XO agents only
- Users: Power analysts
-- 2. AGGREGATE (base metrics)
GOLD.AGG_AGENT_DAILY
- Materialized: table
- Grain: row per agent per day
- Source: fct_contacts
- Reused by: 3+ report views
- Users: Power users, reports
-- 3. REPORT (consumption)
GOLD.RPT_AGENT_DASHBOARD
- Materialized: view
- Source: agg_agent_daily + roster + targets
- Calculated: variance, flags
- Users: Tableau, business users
Alternatives Considered¶
Alternative 1: Just Facts and Reports (no Aggregates)¶
Rejected because: - Reports would aggregate facts on-the-fly (slow) - No reusability (each report aggregates separately) - Higher compute costs
Alternative 2: Materialize Reports as Tables¶
Rejected because: - Storage cost (duplicate data) - Stale roster/target data - Harder to maintain (change requires rebuild)
Exception: Can materialize complex reports as tables if: - Very expensive joins (many tables) - Query performance critical - Stale data acceptable
Alternative 3: Single "Gold" layer (no subtypes)¶
Rejected because: - No clear purpose for each table - Users confused about what to query - Mixed concerns (enrichment + aggregation + presentation)
Related Decisions¶
- ADR 001: Load strategy terminology (full_refresh, incremental, historical)
- ADR 004: Metadata columns and hashing strategy
- ADR 006: Bronze truncate/force strategy
- ADR 009: Naming conventions for Gold layer (see below)
- ADR 010: Reference data placement (rosters, glossaries)
Notes¶
Standard KPI Schema: All clients should implement standard KPI columns in agg_*_daily tables:
- KPI_DATE, AGENT_ID, AGENT_EMAIL
- CONTACT_COUNT, CONVERSATION_COUNT
- AVG_HANDLE_TIME, AVG_AHT
- AVG_CSAT, AVG_QA_SCORE
Client-specific KPIs are added as additional columns.
View Naming: Use rpt_ prefix for reports (not v_) for consistency with fct_, dim_, agg_ (all 3-4 char prefixes).
dbt Materialization Config: