Skip to content

ADR 008: Gold Layer Architecture - Facts, Dimensions, Aggregates, Reports

Date: 2026-01-05 Status: ✅ Accepted Deciders: Data Engineering Team


Context

With Bronze and Silver layers established, we need to define the Gold layer architecture for enriched, aggregated, and consumption-ready data.

Key challenges: 1. Multiple types of models in Gold (enriched facts, dimensions, aggregates, reports) 2. Need clear separation between building blocks (reusable) and consumption layer (end-user facing) 3. Performance optimization (when to materialize vs use views) 4. Naming conventions that indicate model type without redundancy


Decision

Implement a four-type Gold layer architecture:

1. Facts (fct_ prefix)

  • Purpose: Enriched granular data (same grain as Silver, but with JOINs)
  • Materialization: table or incremental
  • Characteristics:
  • Same grain as Silver (row per event/transaction)
  • JOIN with employee rosters (CORE_DB)
  • JOIN with glossaries (client DB or CORE_DB)
  • Filter to relevant records (e.g., XO agents only)
  • Source of Truth for ad-hoc analysis

Example:

GOLD.FCT_CONTACTS
  - Contact-level facts
  - All Silver columns + employee metadata + glossary metadata
  - Filtered to XO agents
  - Materialized as table


2. Dimensions (dim_ prefix)

  • Purpose: Current state entities and reference data
  • Materialization: table or incremental (SCD Type 1 or Type 2)
  • Characteristics:
  • Current or historical state of entities
  • Slowly changing dimensions
  • Reference data for joins

Example:

GOLD.DIM_AGENT
  - Current agent roster
  - One row per active agent
  - SCD Type 1 (latest state)


3. Aggregates (agg_ prefix)

  • Purpose: Pre-aggregated base metrics (building blocks for reports)
  • Materialization: table or incremental
  • Characteristics:
  • Single grain (agent-day, channel-hour, etc.)
  • Standard KPI calculations
  • Reusable across multiple reports
  • Few or no JOINs
  • Pre-calculated for performance

Example:

GOLD.AGG_AGENT_DAILY
  - Row per agent per day
  - Pre-aggregated from fct_contacts
  - Standard KPIs + client-specific KPIs
  - Used by multiple report views


4. Reports (rpt_ prefix)

  • Purpose: End-user consumption layer (Tableau, analysts)
  • Materialization: view (virtual, zero storage)
  • Characteristics:
  • JOINs aggregates with dimensions and targets
  • Calculated fields (variance, performance flags)
  • Tableau-optimized (all needed columns)
  • Directly consumed by end users
  • Always fresh (computed on query)

Example:

GOLD.RPT_AGENT_DASHBOARD
  - Tableau agent dashboard
  - JOINs: agg_agent_daily + dim_agent + targets
  - Calculated: variance, performance flags
  - View (no storage cost)


Rationale

Why separate Facts and Aggregates?

Facts (grain-level): - Needed for ad-hoc deep-dive analysis - Power users query these for custom analysis - Too slow to aggregate on-the-fly (millions of rows)

Aggregates (rolled-up): - Pre-calculated for dashboard performance - Reusable across multiple reports - Smaller data size (faster queries)

Why separate Aggregates and Reports?

Aggregates (building blocks): - Reusable base metrics - Single grain, minimal JOINs - Source data for multiple reports

Reports (consumption): - Customized per consumer (Tableau, analysts) - JOINs with current rosters/targets (always fresh) - Calculated fields (business logic) - Zero storage cost (views)

Why use views for Reports?

  • ✅ Always fresh (roster and target changes immediately reflected)
  • ✅ Zero storage cost
  • ✅ Easy to modify (change view, all dashboards updated)
  • ✅ Centralized business logic (not in Tableau)

Performance: Reports query pre-aggregated tables (fast), not raw facts.


Consequences

Positive

Clear separation of concerns: - Facts = SOT for analysts - Aggregates = Building blocks - Reports = End-user consumption

Performance optimization: - Pre-aggregate expensive operations - Views are fast (query pre-aggregated data)

Cost efficiency: - Store only facts and aggregates - Reports are views (zero storage)

Flexibility: - Modify reports without reprocessing data - Business logic in dbt (version controlled)

Reusability: - One aggregate table → many report views

Negative

⚠️ More layers to understand: - Need to train users on which to query (facts vs agg vs reports) - Solution: Clear naming + documentation

⚠️ View performance depends on aggregate freshness: - If aggregate is stale, report shows stale data - Solution: Refresh aggregates frequently


Implementation

Data Flow:

BRONZE (xo-foundry)
  └─ Raw landing (truncated daily)
SILVER (dbt)
  └─ Historical preservation (no enrichment)
GOLD FACTS (dbt)
  └─ Enriched entities (employee + glossary + filtering)
GOLD AGGREGATES (dbt)
  └─ Pre-aggregated metrics (building blocks)
GOLD REPORTS (dbt)
  └─ Consumption views (Tableau-ready)

Example for Warby Parker:

-- 1. FACT (enriched granular data)
GOLD.FCT_CONTACTS
  - Materialized: table
  - Grain: row per contact event
  - Enrichment: employee + glossary metadata
  - Filter: XO agents only
  - Users: Power analysts

-- 2. AGGREGATE (base metrics)
GOLD.AGG_AGENT_DAILY
  - Materialized: table
  - Grain: row per agent per day
  - Source: fct_contacts
  - Reused by: 3+ report views
  - Users: Power users, reports

-- 3. REPORT (consumption)
GOLD.RPT_AGENT_DASHBOARD
  - Materialized: view
  - Source: agg_agent_daily + roster + targets
  - Calculated: variance, flags
  - Users: Tableau, business users

Alternatives Considered

Alternative 1: Just Facts and Reports (no Aggregates)

Rejected because: - Reports would aggregate facts on-the-fly (slow) - No reusability (each report aggregates separately) - Higher compute costs

Alternative 2: Materialize Reports as Tables

Rejected because: - Storage cost (duplicate data) - Stale roster/target data - Harder to maintain (change requires rebuild)

Exception: Can materialize complex reports as tables if: - Very expensive joins (many tables) - Query performance critical - Stale data acceptable

Alternative 3: Single "Gold" layer (no subtypes)

Rejected because: - No clear purpose for each table - Users confused about what to query - Mixed concerns (enrichment + aggregation + presentation)


  • ADR 001: Load strategy terminology (full_refresh, incremental, historical)
  • ADR 004: Metadata columns and hashing strategy
  • ADR 006: Bronze truncate/force strategy
  • ADR 009: Naming conventions for Gold layer (see below)
  • ADR 010: Reference data placement (rosters, glossaries)

Notes

Standard KPI Schema: All clients should implement standard KPI columns in agg_*_daily tables: - KPI_DATE, AGENT_ID, AGENT_EMAIL - CONTACT_COUNT, CONVERSATION_COUNT - AVG_HANDLE_TIME, AVG_AHT - AVG_CSAT, AVG_QA_SCORE

Client-specific KPIs are added as additional columns.

View Naming: Use rpt_ prefix for reports (not v_) for consistency with fct_, dim_, agg_ (all 3-4 char prefixes).

dbt Materialization Config:

# dbt_project.yml
models:
  gold:
    facts:
      +materialized: table
      +tags: ['gold', 'facts']
    dimensions:
      +materialized: table
      +tags: ['gold', 'dimensions']
    aggregates:
      +materialized: table
      +tags: ['gold', 'aggregates']
    reports:
      +materialized: view
      +tags: ['gold', 'reports']