Skip to content

ADR 019 — Gold Layer Standardization: Naming, Structure, and KPI Conventions

Status

Accepted

Context

The Gold layer has grown organically during Warby Parker development. As we onboard Conde Nast and future clients, several inconsistencies will compound into architectural debt:

  1. Naming collisions: dbt model names must be globally unique. qa.sql in silver/warbyparker/ would collide with qa.sql in silver/condenast/. Current mitigation (Conde Nast uses maestro_ prefix) is ad-hoc.
  2. Aggregate inconsistency: Models like agg_agent_daily_contacts, agg_wbp_qa_daily, and agg_qa_condenast_daily use different naming patterns — client prefix position varies, grain position varies.
  3. Silver filtering: The Silver CSAT model filters out non-XO agents (tiffany.buckner, tu.%). This is a business rule in a layer that should only contain data quality rules.
  4. Fact filtering: Gold facts use INNER JOIN to roster, silently dropping all non-XO contacts. This makes ad-hoc analysis of the full dataset impossible without going back to Silver and re-joining.
  5. Aggregate sourcing: agg_agent_daily_contacts joins Silver models and roster directly, bypassing facts. agg_agent_performance_daily correctly sources from facts. These disagree on the architecture.
  6. Report proliferation: 13 report views for WBP, many being time-period variants of the same pattern (wow/mom/qoq).
  7. Undocumented int_ prefix: int_contacts_hold exists and is shared by two models, but intermediate models are not part of the documented naming convention.
  8. Project structure: layer/client (current) vs client/layer — the latter matches the database isolation model better.
  9. No shared macros: Roster enrichment, tenure bucketing, and QA pooled averages are copy-pasted across models.
  10. No cross-client KPI standardization: Each client's aggregates have different column names and structures. No mechanism for a unified business overview.

This ADR establishes the conventions that all current and future dbt models must follow.

Decision

1. Silver: No Business Filters

Silver is the typed, deduplicated, complete mirror of Bronze. The only acceptable filters are data quality filters:

Allowed in Silver Not Allowed in Silver
WHERE primary_key IS NOT NULL WHERE admin NOT IN ('tiffany.buckner')
Deduplication by RECORD_KEY Agent/role-based filtering
Removing structurally malformed rows ('∅' as an ID) Date range filtering

Business rules (XO-agent filtering, status filtering, exclusion lists) belong in Gold.

Migration: Move the CSAT exclusion filter from silver/warbyparker/csat.sql to fct_wbp_csat. The '∅' filter stays in Silver (it is garbage data, not a business rule).

2. Facts: XO-Only by Construction — Source Category Framework

Gold fact tables contain XO agent data only. There is no IS_XO_AGENT flag. The mechanism used to enforce XO-only varies by source type, captured by a source_category field in each model's config.meta.

The Three Cases

Case Source Type Silver Fact Join EID Test
1 XO-only source No filter LEFT JOIN roster not_null warn → roster gap alert
2 Mixed + in-source XO identifier DQ filter only ( removal) In-source filter in Gold before LEFT JOIN roster not_null warn → roster gap alert
3 Mixed, no in-source XO identifier No filter possible INNER JOIN roster Not applicable

Case 1 — XO-only source (e.g., GSheets QA Internal, Maestro/CND): The source tool is only used by XO agents. Silver is kept unfiltered. Gold uses LEFT JOIN to the roster so that a NULL EID surfaces as a data quality signal: the agent is in the source system but missing from the roster.

-- Case 1 pattern
left join {{ ref('bi_roster_warbyparker') }} r
    ON data.advisors_name = r.okta_user
-- NULL EID = agent in source but missing from roster (roster gap alert)

Case 2 — Mixed source with in-source XO identifier (e.g., GSheets CSAT with tu. prefix): The source has mixed agents, but XO agents can be identified by a field value prefix or pattern. Silver keeps only the DQ filter ( removal). The in-source identifier filter is applied in Gold before the LEFT JOIN:

-- Case 2 pattern — filter first, then join
from {{ ref('wbp_gsheet_csat') }} data
left join {{ ref('bi_roster_warbyparker') }} r
    ON data.ADMIN_CORRECTED = r.OKTA_USER
where data.ADMIN_CORRECTED NOT LIKE 'tu.%'
  AND data.ADMIN_CORRECTED != 'tiffany.buckner'
-- NULL EID after filter = XO-identified agent missing from roster (roster gap alert)

Case 3 — Mixed source, no in-source XO identifier (e.g., Gladly for WBP, GSheets QA Partner): The source has mixed agents and no field distinguishes XO from non-XO. The roster join itself is the XO filter: INNER JOIN keeps only agents whose ID appears in the roster.

-- Case 3 pattern
inner join {{ ref('bi_roster_warbyparker') }} r
    ON ce.AGENT_ID_FIRST_CLOSED = r.GLADLY_ID
-- Non-XO agents have no roster entry → excluded automatically
-- EID is always populated (INNER JOIN guarantee)

Schema.yml Convention

Each fact model records its case in config.meta:

config:
  meta:
    source_category: "1"                        # "1", "2", or "3"
    xo_filter_strategy: left_join_eid_test      # Case 1/2
    # or
    xo_filter_strategy: inner_join              # Case 3

Case 1 and 2 facts also carry a not_null EID test at warn severity:

columns:
  - name: EID
    tests:
      - not_null:
          config:
            severity: warn   # NULL EID = roster gap, not a pipeline failure

Downstream Aggregates

Aggregates sourcing from Case 1 or 2 facts must filter WHERE eid IS NOT NULL to exclude roster-gap rows from metrics. Aggregates sourcing from Case 3 facts need no EID filter (INNER JOIN guarantees EID is populated).

Why this replaces IS_XO_AGENT: The IS_XO_AGENT flag approach assumed that keeping non-XO rows in facts was always useful. In practice, non-XO rows are never used in metrics and create confusion at the aggregate layer. The source-category framework is more explicit: each model declares exactly how it achieves XO-only filtering, and the mechanism is self-documenting in both the SQL header comment and the schema.yml metadata.

3. Intermediate Models: int_ Prefix Formalized

Intermediate models are shared logic used by 2+ downstream Gold models. They are a recognized dbt pattern.

Attribute Rule
Prefix int_
Materialization ephemeral (compiled as inline subquery, no Snowflake object)
Source Silver models or other intermediates
Consumers Gold facts and aggregates only
Naming int_{client}_{description}
When to use Logic shared by 2+ models. If used by only 1 model, inline as CTE.

Example: int_wbp_contacts_hold (shared by fct_wbp_contacts and agg_wbp_contacts_daily).

4. Aggregates: Facts-Only Source, Daily Grain, Standard Contract

Source: Gold facts only. Never Silver, never Bronze, never roster directly. If an aggregate needs a column that doesn't exist in a fact, add it to the fact.

Grain: Daily is the only materialized aggregate grain. Weekly/monthly/quarterly rollups happen in report views via DATE_TRUNC. At current scale (~200 agents/client), Snowflake computes these rollups in milliseconds over daily aggregates.

Column contract: Each KPI domain (performance, QA, CSAT) has a standard set of columns that every client's aggregate must produce. NULLs for metrics the client doesn't have. This enables trivial UNION ALL in CORE_DB master views.

Naming: agg_{client}_{subject}_{grain} — grain is always the last token.

Store sums and counts, never averages: Downstream consumers compute avg = SUM(x) / SUM(count) at whatever grain they need. This prevents the "average of averages" error.

5. Reports: Views from Aggregates, Period Rollups

Rule Value
Materialization Always view
Source Gold aggregates only (not facts, not Silver)
Purpose Tableau-ready, presentation layer
Period rollups DATE_TRUNC('week', date) + GROUP BY in the view
Period-over-period LAG() window functions for WoW/MoM/QoQ deltas

Reports should not proliferate. If multiple reports differ only by time period, consider whether Tableau can handle the period selection via parameters instead of separate views.

5a. Report Grain Suffix and Period-over-Period Naming (Amendment 2026-04-24)

This section codifies the naming convention for the rpt_ layer. It is introduced in parallel with the agent KPI v2 architecture (productivity/performance split + account-level rpt views) and resolves the inconsistency that exists today, where some rpt views carry a grain suffix (rpt_wbp_email_daily, rpt_wbp_ai_agent_concurrency_daily) and others do not (rpt_wbp_agent_performance_channel, rpt_wbp_gladly_blended_ai_productivity).

Grain suffix is required

All new rpt_ views MUST carry a grain suffix as the last token:

Suffix Grain
_daily One row per day at the stated dimension grain
_weekly One row per week
_monthly One row per month
_quarterly One row per quarter

The suffix describes the time grain of the view. It is orthogonal to — and stacks with — the semantic purpose (_performance, _productivity) and dimension grain tokens (_channel, _account).

Example: rpt_wbp_account_performance_channel_daily decomposes as: rpt_ (layer) + wbp_ (client) + account_performance_ (purpose) + channel_ (dimension) + daily (time grain).

Period-over-period views use short-form suffixes

Period-over-period views (views that expose current-period vs prior-period comparisons as side-by-side columns via LAG()) MUST use the short-form suffix as the last token:

Suffix Meaning
_wow Week-over-week
_mom Month-over-month
_qoq Quarter-over-quarter
_yoy Year-over-year

These replace the long form (_week_over_week, _month_over_month, etc.) that appears in several grandfathered views. Short-form is industry-standard shorthand and reduces name length.

Distinction: period rollup vs period-over-period

These are two different things and must not be conflated:

  • Period rollup = same metrics, coarser grain. Rows = weeks or months. Suffix: _weekly, _monthly.
  • Period-over-period = comparison columns alongside rows at a base grain. Suffix: _wow, _mom, _qoq, _yoy.

A _weekly view has week_start_date as its date column and SUMs across days. A _wow view has week_start_date plus a prior_week_* column set for each metric and computes a delta.

Semantic categories encode grain-safety

Two semantic purposes now have explicit grain rules attached:

Category Typical grain Channel safe to sum?
performance date × eid × channel or date × channel Yes — channel-additive
productivity date × eid No — rate metrics denominated in total active time; never replicate across channel rows

The semantic category is part of the model name (_performance_, _productivity_). Authors and analysts can read the safety contract from the name alone.

Grandfathered views

The following existing views are grandfathered — they continue to operate without rename for Tableau-compatibility reasons. Renaming them is tracked as a separate migration ticket coordinated with Tableau owners, one view per PR:

  • rpt_wbp_agent_performance_channel (missing _daily suffix)
  • rpt_wbp_gladly_blended_ai_productivity (missing grain suffix)
  • rpt_wbp_agent_qa_trend (no grain or time suffix)
  • rpt_wbp_xoos_agent_performance (missing grain suffix)
  • rpt_wbp_ai_agent_qa_week_over_week, rpt_wbp_ai_agent_qa_month_over_month (long-form period)
  • rpt_wbp_ai_wave_qa_week_over_week, rpt_wbp_ai_wave_qa_month_over_month, rpt_wbp_ai_wave_qa_quarter_over_quarter
  • rpt_wbp_ai_qa_attributes_week_over_week, rpt_wbp_ai_qa_attributes_month_over_month
  • rpt_wbp_ai_vs_nonai_qa_mom, rpt_wbp_ai_vs_nonai_qa_wow, rpt_wbp_ai_vs_nonai_qa_qoq (short-form already compliant; only grain suffix missing)

Already-compliant examples (for reference): rpt_wbp_email_daily, rpt_wbp_ai_agent_concurrency_daily, rpt_wbp_ai_agents_durations_daily.

Rationale

  • Registry-ready naming: docs/reference/dbt-model-registry.md becomes greppable — "all daily WBP rpts" is one query.
  • Consumer clarity: Tableau authors reading the view list never have to ask "is this daily or weekly?"
  • Uniform with the aggregate layer: Section 4 already mandates grain as the last token on agg_ names. This amendment extends the same rule to rpt_.
  • Additivity guarantee: The performance/productivity semantic split in the name encodes whether rows are channel-additive. Analysts stop double-counting denominators.

6. Model Naming: Client Short Code Required

Every model name includes a 3-character client code. This is non-negotiable for a multi-tenant dbt project with shared entity names.

Client codes (registered, enforced):

Client Code
Warby Parker wbp
Conde Nast cnd
Core/Shared core

New clients get a 3-letter code assigned at onboarding.

Naming by layer:

Layer Pattern Example
Silver {client}_{object} wbp_contact_timestamps, cnd_maestro_answers
Gold Intermediate int_{client}_{description} int_wbp_contacts_hold
Gold Fact fct_{client}_{object} fct_wbp_contacts, fct_cnd_qa
Gold Dimension dim_{entity} dim_employee_warbyparker
Gold Aggregate agg_{client}_{subject}_{grain} agg_wbp_qa_daily, agg_cnd_qa_daily
Gold Report rpt_{client}_{purpose}_{grain} rpt_wbp_agent_productivity_daily
Core Report rpt_master_{purpose}_{grain} rpt_master_performance_daily

Grain is always the last token in both aggregate and report names (see Section 5a for the rpt grain suffix rule and grandfather list). Client code is always immediately after the layer prefix.

7. Project Structure: Domain-First (client/layer)

Restructure models/ from layer/client to client/layer:

models/
├── warbyparker/
│   ├── _warbyparker__sources.yml
│   ├── silver/
│   │   ├── schema.yml
│   │   └── wbp_*.sql
│   ├── gold/
│   │   ├── schema.yml
│   │   └── fct_wbp_*.sql, agg_wbp_*.sql, rpt_wbp_*.sql
│   └── intermediate/
│       └── int_wbp_*.sql
├── condenast/
│   ├── _condenast__sources.yml
│   ├── silver/
│   └── gold/
└── core/
    ├── _core__sources.yml
    ├── silver/
    └── gold/
        ├── dim_employee_*.sql
        └── rpt_master_*.sql        ← Cross-client UNION views

Why domain-first: - Isolation matches the database architecture (each client = separate Snowflake DB, no cross-client joins) - dbt run --select warbyparker runs the full client pipeline - Onboarding a new client = create one folder - Database set once per client in dbt_project.yml, not repeated per layer

dbt_project.yml pattern:

models:
  xo_medallion:
    warbyparker:
      +database: |
        {%- if target.name == "dev" -%} WBP_DB_DEV
        {%- elif target.name == "prod" -%} WBP_DB_PROD
        {%- else -%} invalid_environment
        {%- endif -%}
      silver:
        +materialized: incremental
        +schema: SILVER
        +unique_key: RECORD_KEY
        +incremental_strategy: append
        +on_schema_change: sync_all_columns
        +tags: ["silver", "warbyparker"]
      gold:
        +materialized: table
        +schema: GOLD
        +tags: ["gold", "warbyparker"]
      intermediate:
        +materialized: ephemeral
        +tags: ["intermediate", "warbyparker"]

8. Macros: Shared Patterns

Create shared macros in apps/dbt/xo-medallion/macros/ for patterns that repeat across 2+ models:

Macro Purpose Priority
roster_columns(alias) Standard roster column SELECT list (EID, name, direct_leader, site) P1
weekly_tenure_bucket(date_col, start_date_col) Weekly tenure bucket CASE statement P2
monthly_tenure_bucket(date_col, start_date_col) Monthly tenure bucket CASE statement P2
pooled_average(sum_a, count_a, sum_b, count_b) QA combined pooled average formula P3
ai_enrichment(agent_col, alias) JOIN to ai_agents seed + AGENT_TYPE derivation P4

9. CORE_DB Master Views: Standardized Column Contract

Each client's performance aggregate produces the same columns. The master view is a simple UNION ALL.

Standard aggregate column contract (example for agg_{client}_agent_performance_daily):

date                    DATE        -- Required
eid                     VARCHAR     -- Required
agent_name              VARCHAR     -- Required
direct_leader           VARCHAR     -- Required
site                    VARCHAR     -- Required
client                  VARCHAR     -- Required ('WarbyParker', 'CondeNast')
-- Performance metrics (NULL if client lacks this source)
contact_count           NUMBER
handle_time_sec         NUMBER
acw_sec                 NUMBER
hold_time_sec           NUMBER
talk_time_sec           NUMBER
-- Quality metrics
qa_score_sum            FLOAT
qa_count                NUMBER
-- Satisfaction metrics
csat_score_sum          FLOAT
csat_count              NUMBER
-- Attendance metrics
active_time_min         FLOAT
-- SLA metrics
sla_met_count           NUMBER
sla_eligible_count      NUMBER

Master view:

-- core/gold/rpt_master_agent_performance.sql
SELECT * FROM {{ ref('agg_wbp_agent_performance_daily') }}
UNION ALL
SELECT * FROM {{ ref('agg_cnd_agent_performance_daily') }}

Adding a new client = create their aggregate with the standard columns + add one UNION ALL line. If columns don't match, dbt compile fails — which is the intended safeguard.

10. KPI Registry and Client Glossary (Future Work)

A KPI registry and client-specific glossary mapping are needed but out of scope for this ADR. The work includes:

  • KPI Registry: A reference document (docs/reference/kpi-registry.md) listing every standard KPI with its canonical name, formula, unit, and grain.
  • Client Glossary Mapping: Per-client mapping from their terminology to the standard KPI name (e.g., Warby Parker calls it "Advisor Stars", standard is "CSAT Score").
  • Master KPI Definitions: SQL snippets or macros that compute each KPI from the standard aggregate columns, ensuring consistency across reports.
  • KPI Domain Classification: Performance (handle time, contacts, SLA), Quality (QA scores, attributes), Satisfaction (CSAT), Workforce Management (utilization, occupancy, logged-in time).

This will be addressed in a follow-up spike (Phase 6 of the execution plan).

Consequences

What gets easier

  • Adding a new client: create one folder, follow the naming convention, produce aggregates with the standard column contract, add UNION ALL to master views.
  • Debugging: Silver is complete; Gold facts are XO-only by documented construction; the source category comment in each model explains the filtering strategy.
  • Cross-client reporting: master views in CORE_DB are trivial UNION ALLs.
  • Code reuse: macros eliminate copy-paste of roster enrichment, tenure bucketing, QA averages.

What gets harder

  • Migration cost: Renaming every model and updating every ref() is a large, breaking change. Must be done in a single coordinated PR per phase.
  • Roster gap visibility: Case ½ facts emit NULL EID rows when an agent is present in the source but absent from the roster. These rows are excluded from aggregates by WHERE eid IS NOT NULL and surface as dbt warn-severity test failures — an intentional signal for ops/analysts to fix the roster.
  • Longer model names: fct_wbp_contacts vs fct_contacts. Minor readability cost, major safety gain.

What we give up

  • The ability to have "simple" model names like qa.sql or csat.sql without client prefix. This was never safe in a multi-tenant project.
  • Direct Silver-to-aggregate shortcuts. All aggregates must go through facts, even when the fact is a thin wrapper. This is the correct trade-off for lineage consistency.

Options Considered

Naming: source prefix vs client prefix

Source prefix (gladly_contact_timestamps) was rejected because multiple clients can share the same source (e.g., Google Sheets for CSAT). Client prefix is the only reliable disambiguation.

Project structure: layer/client (status quo)

Kept layer/client was rejected because it scatters a client's models across two top-level directories, doesn't match the database isolation model, and makes dbt run --select <client> impossible.

Aggregate grains: materialize weekly/monthly/quarterly

Rejected. At current scale (~200 agents/client), the daily aggregate has ~73K rows/year. Snowflake computes DATE_TRUNC + GROUP BY over this in milliseconds. Materializing coarser grains adds 3x maintenance for zero performance benefit. Revisit if scale exceeds 10K agents or 3+ years of history per client.

Facts: IS_XO_AGENT flag + LEFT JOIN (original Phase 3 design)

Rejected after Phase 3 design review. The flag approach assumes non-XO rows are useful in facts — in practice they are not and create confusion at the aggregate layer. The source-category framework (Section 2) is more explicit and eliminates the need to filter every aggregate.

Facts: filtered to XO only (status quo — INNER JOIN everywhere)

The original status quo was rejected because it lacks a systematic framework: some models used redundant WHERE EID IS NOT NULL guards, join types were inconsistent, and the filtering rationale was undocumented. The source-category framework formalizes the intent with consistent patterns per case.


Supersedes relevant sections of: ADR 008 (Gold layer types), ADR 009 (naming conventions for dbt models). Those ADRs remain active for non-dbt naming (S3 paths, DAGs, Bronze tables) but dbt model naming now follows this ADR.