ADR 008: Gold Layer Architecture - Facts, Dimensions, Aggregates, Reports¶

Date: 2026-01-05 Status: ✅ Accepted Deciders: Data Engineering Team

Context¶

With Bronze and Silver layers established, we need to define the Gold layer architecture for enriched, aggregated, and consumption-ready data.

Key challenges: 1. Multiple types of models in Gold (enriched facts, dimensions, aggregates, reports) 2. Need clear separation between building blocks (reusable) and consumption layer (end-user facing) 3. Performance optimization (when to materialize vs use views) 4. Naming conventions that indicate model type without redundancy

Decision¶

Implement a four-type Gold layer architecture:

1. Facts (`fct_` prefix)¶

Purpose: Enriched granular data (same grain as Silver, but with JOINs)
Materialization: table or incremental
Characteristics:
Same grain as Silver (row per event/transaction)
JOIN with employee rosters (CORE_DB)
JOIN with glossaries (client DB or CORE_DB)
Filter to relevant records (e.g., XO agents only)
Source of Truth for ad-hoc analysis

Example:

GOLD.FCT_CONTACTS
  - Contact-level facts
  - All Silver columns + employee metadata + glossary metadata
  - Filtered to XO agents
  - Materialized as table

2. Dimensions (`dim_` prefix)¶

Purpose: Current state entities and reference data
Materialization: table or incremental (SCD Type 1 or Type 2)
Characteristics:
Current or historical state of entities
Slowly changing dimensions
Reference data for joins

Example:

GOLD.DIM_AGENT
  - Current agent roster
  - One row per active agent
  - SCD Type 1 (latest state)

3. Aggregates (`agg_` prefix)¶

Purpose: Pre-aggregated base metrics (building blocks for reports)
Materialization: table or incremental
Characteristics:
Single grain (agent-day, channel-hour, etc.)
Standard KPI calculations
Reusable across multiple reports
Few or no JOINs
Pre-calculated for performance

Example:

GOLD.AGG_AGENT_DAILY
  - Row per agent per day
  - Pre-aggregated from fct_contacts
  - Standard KPIs + client-specific KPIs
  - Used by multiple report views

4. Reports (`rpt_` prefix)¶

Purpose: End-user consumption layer (Tableau, analysts)
Materialization: view (virtual, zero storage)
Characteristics:
JOINs aggregates with dimensions and targets
Calculated fields (variance, performance flags)
Tableau-optimized (all needed columns)
Directly consumed by end users
Always fresh (computed on query)

Example:

GOLD.RPT_AGENT_DASHBOARD
  - Tableau agent dashboard
  - JOINs: agg_agent_daily + dim_agent + targets
  - Calculated: variance, performance flags
  - View (no storage cost)

Rationale¶

Why separate Facts and Aggregates?¶

Facts (grain-level): - Needed for ad-hoc deep-dive analysis - Power users query these for custom analysis - Too slow to aggregate on-the-fly (millions of rows)

Aggregates (rolled-up): - Pre-calculated for dashboard performance - Reusable across multiple reports - Smaller data size (faster queries)

Why separate Aggregates and Reports?¶

Aggregates (building blocks): - Reusable base metrics - Single grain, minimal JOINs - Source data for multiple reports

Reports (consumption): - Customized per consumer (Tableau, analysts) - JOINs with current rosters/targets (always fresh) - Calculated fields (business logic) - Zero storage cost (views)

Why use views for Reports?¶

✅ Always fresh (roster and target changes immediately reflected)
✅ Zero storage cost
✅ Easy to modify (change view, all dashboards updated)
✅ Centralized business logic (not in Tableau)

Performance: Reports query pre-aggregated tables (fast), not raw facts.

Consequences¶

Positive¶

✅ Clear separation of concerns: - Facts = SOT for analysts - Aggregates = Building blocks - Reports = End-user consumption

✅ Performance optimization: - Pre-aggregate expensive operations - Views are fast (query pre-aggregated data)

✅ Cost efficiency: - Store only facts and aggregates - Reports are views (zero storage)

✅ Flexibility: - Modify reports without reprocessing data - Business logic in dbt (version controlled)

✅ Reusability: - One aggregate table → many report views

Negative¶

⚠️ More layers to understand: - Need to train users on which to query (facts vs agg vs reports) - Solution: Clear naming + documentation

⚠️ View performance depends on aggregate freshness: - If aggregate is stale, report shows stale data - Solution: Refresh aggregates frequently

Implementation¶

Data Flow:¶

BRONZE (xo-foundry)
  └─ Raw landing (truncated daily)
       ↓
SILVER (dbt)
  └─ Historical preservation (no enrichment)
       ↓
GOLD FACTS (dbt)
  └─ Enriched entities (employee + glossary + filtering)
       ↓
GOLD AGGREGATES (dbt)
  └─ Pre-aggregated metrics (building blocks)
       ↓
GOLD REPORTS (dbt)
  └─ Consumption views (Tableau-ready)

Example for Warby Parker:¶

-- 1. FACT (enriched granular data)
GOLD.FCT_CONTACTS
  - Materialized: table
  - Grain: row per contact event
  - Enrichment: employee + glossary metadata
  - Filter: XO agents only
  - Users: Power analysts

-- 2. AGGREGATE (base metrics)
GOLD.AGG_AGENT_DAILY
  - Materialized: table
  - Grain: row per agent per day
  - Source: fct_contacts
  - Reused by: 3+ report views
  - Users: Power users, reports

-- 3. REPORT (consumption)
GOLD.RPT_AGENT_DASHBOARD
  - Materialized: view
  - Source: agg_agent_daily + roster + targets
  - Calculated: variance, flags
  - Users: Tableau, business users

Alternatives Considered¶

Alternative 1: Just Facts and Reports (no Aggregates)¶

Rejected because: - Reports would aggregate facts on-the-fly (slow) - No reusability (each report aggregates separately) - Higher compute costs

Alternative 2: Materialize Reports as Tables¶

Rejected because: - Storage cost (duplicate data) - Stale roster/target data - Harder to maintain (change requires rebuild)

Exception: Can materialize complex reports as tables if: - Very expensive joins (many tables) - Query performance critical - Stale data acceptable

Alternative 3: Single "Gold" layer (no subtypes)¶

Rejected because: - No clear purpose for each table - Users confused about what to query - Mixed concerns (enrichment + aggregation + presentation)

ADR 001: Load strategy terminology (full_refresh, incremental, historical)
ADR 004: Metadata columns and hashing strategy
ADR 006: Bronze truncate/force strategy
ADR 009: Naming conventions for Gold layer (see below)
ADR 010: Reference data placement (rosters, glossaries)

Notes¶

Standard KPI Schema: All clients should implement standard KPI columns in agg_*_daily tables: - KPI_DATE, AGENT_ID, AGENT_EMAIL - CONTACT_COUNT, CONVERSATION_COUNT - AVG_HANDLE_TIME, AVG_AHT - AVG_CSAT, AVG_QA_SCORE

Client-specific KPIs are added as additional columns.

View Naming: Use rpt_ prefix for reports (not v_) for consistency with fct_, dim_, agg_ (all 3-4 char prefixes).

dbt Materialization Config:

# dbt_project.yml
models:
  gold:
    facts:
      +materialized: table
      +tags: ['gold', 'facts']
    dimensions:
      +materialized: table
      +tags: ['gold', 'dimensions']
    aggregates:
      +materialized: table
      +tags: ['gold', 'aggregates']
    reports:
      +materialized: view
      +tags: ['gold', 'reports']

ADR 008: Gold Layer Architecture - Facts, Dimensions, Aggregates, Reports¶

Context¶

Decision¶

1. Facts (fct_ prefix)¶

2. Dimensions (dim_ prefix)¶

3. Aggregates (agg_ prefix)¶

4. Reports (rpt_ prefix)¶

Rationale¶

Why separate Facts and Aggregates?¶

Why separate Aggregates and Reports?¶

Why use views for Reports?¶

Consequences¶

Positive¶

Negative¶

Implementation¶

Data Flow:¶

Example for Warby Parker:¶

Alternatives Considered¶

Alternative 1: Just Facts and Reports (no Aggregates)¶

Alternative 2: Materialize Reports as Tables¶

Alternative 3: Single "Gold" layer (no subtypes)¶

Related Decisions¶

Notes¶

1. Facts (`fct_` prefix)¶

2. Dimensions (`dim_` prefix)¶

3. Aggregates (`agg_` prefix)¶

4. Reports (`rpt_` prefix)¶