ADR 019 — Gold Layer Standardization: Naming, Structure, and KPI Conventions¶

Status¶

Accepted

Context¶

The Gold layer has grown organically during Warby Parker development. As we onboard Conde Nast and future clients, several inconsistencies will compound into architectural debt:

Naming collisions: dbt model names must be globally unique. qa.sql in silver/warbyparker/ would collide with qa.sql in silver/condenast/. Current mitigation (Conde Nast uses maestro_ prefix) is ad-hoc.
Aggregate inconsistency: Models like agg_agent_daily_contacts, agg_wbp_qa_daily, and agg_qa_condenast_daily use different naming patterns — client prefix position varies, grain position varies.
Silver filtering: The Silver CSAT model filters out non-XO agents (tiffany.buckner, tu.%). This is a business rule in a layer that should only contain data quality rules.
Fact filtering: Gold facts use INNER JOIN to roster, silently dropping all non-XO contacts. This makes ad-hoc analysis of the full dataset impossible without going back to Silver and re-joining.
Aggregate sourcing: agg_agent_daily_contacts joins Silver models and roster directly, bypassing facts. agg_agent_performance_daily correctly sources from facts. These disagree on the architecture.
Report proliferation: 13 report views for WBP, many being time-period variants of the same pattern (wow/mom/qoq).
Undocumented int_ prefix: int_contacts_hold exists and is shared by two models, but intermediate models are not part of the documented naming convention.
Project structure: layer/client (current) vs client/layer — the latter matches the database isolation model better.
No shared macros: Roster enrichment, tenure bucketing, and QA pooled averages are copy-pasted across models.
No cross-client KPI standardization: Each client's aggregates have different column names and structures. No mechanism for a unified business overview.

This ADR establishes the conventions that all current and future dbt models must follow.

Decision¶

1. Silver: No Business Filters¶

Silver is the typed, deduplicated, complete mirror of Bronze. The only acceptable filters are data quality filters:

Allowed in Silver	Not Allowed in Silver
`WHERE primary_key IS NOT NULL`	`WHERE admin NOT IN ('tiffany.buckner')`
Deduplication by `RECORD_KEY`	Agent/role-based filtering
Removing structurally malformed rows (`'∅'` as an ID)	Date range filtering

Business rules (XO-agent filtering, status filtering, exclusion lists) belong in Gold.

Migration: Move the CSAT exclusion filter from silver/warbyparker/csat.sql to fct_wbp_csat. The '∅' filter stays in Silver (it is garbage data, not a business rule).

2. Facts: XO-Only by Construction — Source Category Framework¶

Gold fact tables contain XO agent data only. There is no IS_XO_AGENT flag. The mechanism used to enforce XO-only varies by source type, captured by a source_category field in each model's config.meta.

The Three Cases¶

Case	Source Type	Silver	Fact Join	EID Test
1	XO-only source	No filter	`LEFT JOIN` roster	`not_null` warn → roster gap alert
2	Mixed + in-source XO identifier	DQ filter only (`∅` removal)	In-source filter in Gold before `LEFT JOIN` roster	`not_null` warn → roster gap alert
3	Mixed, no in-source XO identifier	No filter possible	`INNER JOIN` roster	Not applicable

Case 1 — XO-only source (e.g., GSheets QA Internal, Maestro/CND): The source tool is only used by XO agents. Silver is kept unfiltered. Gold uses LEFT JOIN to the roster so that a NULL EID surfaces as a data quality signal: the agent is in the source system but missing from the roster.

-- Case 1 pattern
left join {{ ref('bi_roster_warbyparker') }} r
    ON data.advisors_name = r.okta_user
-- NULL EID = agent in source but missing from roster (roster gap alert)

Case 2 — Mixed source with in-source XO identifier (e.g., GSheets CSAT with tu. prefix): The source has mixed agents, but XO agents can be identified by a field value prefix or pattern. Silver keeps only the DQ filter (∅ removal). The in-source identifier filter is applied in Gold before the LEFT JOIN:

-- Case 2 pattern — filter first, then join
from {{ ref('wbp_gsheet_csat') }} data
left join {{ ref('bi_roster_warbyparker') }} r
    ON data.ADMIN_CORRECTED = r.OKTA_USER
where data.ADMIN_CORRECTED NOT LIKE 'tu.%'
  AND data.ADMIN_CORRECTED != 'tiffany.buckner'
-- NULL EID after filter = XO-identified agent missing from roster (roster gap alert)

Case 3 — Mixed source, no in-source XO identifier (e.g., Gladly for WBP, GSheets QA Partner): The source has mixed agents and no field distinguishes XO from non-XO. The roster join itself is the XO filter: INNER JOIN keeps only agents whose ID appears in the roster.

-- Case 3 pattern
inner join {{ ref('bi_roster_warbyparker') }} r
    ON ce.AGENT_ID_FIRST_CLOSED = r.GLADLY_ID
-- Non-XO agents have no roster entry → excluded automatically
-- EID is always populated (INNER JOIN guarantee)

Schema.yml Convention¶

Each fact model records its case in config.meta:

config:
  meta:
    source_category: "1"                        # "1", "2", or "3"
    xo_filter_strategy: left_join_eid_test      # Case 1/2
    # or
    xo_filter_strategy: inner_join              # Case 3

Case 1 and 2 facts also carry a not_null EID test at warn severity:

columns:
  - name: EID
    tests:
      - not_null:
          config:
            severity: warn   # NULL EID = roster gap, not a pipeline failure

Downstream Aggregates¶

Aggregates sourcing from Case 1 or 2 facts must filter WHERE eid IS NOT NULL to exclude roster-gap rows from metrics. Aggregates sourcing from Case 3 facts need no EID filter (INNER JOIN guarantees EID is populated).

Why this replaces IS_XO_AGENT: The IS_XO_AGENT flag approach assumed that keeping non-XO rows in facts was always useful. In practice, non-XO rows are never used in metrics and create confusion at the aggregate layer. The source-category framework is more explicit: each model declares exactly how it achieves XO-only filtering, and the mechanism is self-documenting in both the SQL header comment and the schema.yml metadata.

3. Intermediate Models: `int_` Prefix Formalized¶

Intermediate models are shared logic used by 2+ downstream Gold models. They are a recognized dbt pattern.

Attribute	Rule
Prefix	`int_`
Materialization	`ephemeral` (compiled as inline subquery, no Snowflake object)
Source	Silver models or other intermediates
Consumers	Gold facts and aggregates only
Naming	`int_{client}_{description}`
When to use	Logic shared by 2+ models. If used by only 1 model, inline as CTE.

Example: int_wbp_contacts_hold (shared by fct_wbp_contacts and agg_wbp_contacts_daily).

4. Aggregates: Facts-Only Source, Daily Grain, Standard Contract¶

Source: Gold facts only. Never Silver, never Bronze, never roster directly. If an aggregate needs a column that doesn't exist in a fact, add it to the fact.

Grain: Daily is the only materialized aggregate grain. Weekly/monthly/quarterly rollups happen in report views via DATE_TRUNC. At current scale (~200 agents/client), Snowflake computes these rollups in milliseconds over daily aggregates.

Column contract: Each KPI domain (performance, QA, CSAT) has a standard set of columns that every client's aggregate must produce. NULLs for metrics the client doesn't have. This enables trivial UNION ALL in CORE_DB master views.

Naming: agg_{client}_{subject}_{grain} — grain is always the last token.

Store sums and counts, never averages: Downstream consumers compute avg = SUM(x) / SUM(count) at whatever grain they need. This prevents the "average of averages" error.

5. Reports: Views from Aggregates, Period Rollups¶

Rule	Value
Materialization	Always `view`
Source	Gold aggregates only (not facts, not Silver)
Purpose	Tableau-ready, presentation layer
Period rollups	`DATE_TRUNC('week', date)` + `GROUP BY` in the view
Period-over-period	`LAG()` window functions for WoW/MoM/QoQ deltas

Reports should not proliferate. If multiple reports differ only by time period, consider whether Tableau can handle the period selection via parameters instead of separate views.

5a. Report Grain Suffix and Period-over-Period Naming (Amendment 2026-04-24)¶

This section codifies the naming convention for the rpt_ layer. It is introduced in parallel with the agent KPI v2 architecture (productivity/performance split + account-level rpt views) and resolves the inconsistency that exists today, where some rpt views carry a grain suffix (rpt_wbp_email_daily, rpt_wbp_ai_agent_concurrency_daily) and others do not (rpt_wbp_agent_performance_channel, rpt_wbp_gladly_blended_ai_productivity).

Grain suffix is required¶

All new rpt_ views MUST carry a grain suffix as the last token:

Suffix	Grain
`_daily`	One row per day at the stated dimension grain
`_weekly`	One row per week
`_monthly`	One row per month
`_quarterly`	One row per quarter

The suffix describes the time grain of the view. It is orthogonal to — and stacks with — the semantic purpose (_performance, _productivity) and dimension grain tokens (_channel, _account).

Example: rpt_wbp_account_performance_channel_daily decomposes as: rpt_ (layer) + wbp_ (client) + account_performance_ (purpose) + channel_ (dimension) + daily (time grain).

Period-over-period views use short-form suffixes¶

Period-over-period views (views that expose current-period vs prior-period comparisons as side-by-side columns via LAG()) MUST use the short-form suffix as the last token:

Suffix	Meaning
`_wow`	Week-over-week
`_mom`	Month-over-month
`_qoq`	Quarter-over-quarter
`_yoy`	Year-over-year

These replace the long form (_week_over_week, _month_over_month, etc.) that appears in several grandfathered views. Short-form is industry-standard shorthand and reduces name length.

Distinction: period rollup vs period-over-period¶

These are two different things and must not be conflated:

Period rollup = same metrics, coarser grain. Rows = weeks or months. Suffix: _weekly, _monthly.
Period-over-period = comparison columns alongside rows at a base grain. Suffix: _wow, _mom, _qoq, _yoy.

A _weekly view has week_start_date as its date column and SUMs across days. A _wow view has week_start_date plus a prior_week_* column set for each metric and computes a delta.

Semantic categories encode grain-safety¶

Two semantic purposes now have explicit grain rules attached:

Category	Typical grain	Channel safe to sum?
`performance`	`date × eid × channel` or `date × channel`	Yes — channel-additive
`productivity`	`date × eid`	No — rate metrics denominated in total active time; never replicate across channel rows

The semantic category is part of the model name (_performance_, _productivity_). Authors and analysts can read the safety contract from the name alone.

Grandfathered views¶

The following existing views are grandfathered — they continue to operate without rename for Tableau-compatibility reasons. Renaming them is tracked as a separate migration ticket coordinated with Tableau owners, one view per PR:

rpt_wbp_agent_performance_channel (missing _daily suffix)
rpt_wbp_gladly_blended_ai_productivity (missing grain suffix)
rpt_wbp_agent_qa_trend (no grain or time suffix)
rpt_wbp_xoos_agent_performance (missing grain suffix)
rpt_wbp_ai_agent_qa_week_over_week, rpt_wbp_ai_agent_qa_month_over_month (long-form period)
rpt_wbp_ai_wave_qa_week_over_week, rpt_wbp_ai_wave_qa_month_over_month, rpt_wbp_ai_wave_qa_quarter_over_quarter
rpt_wbp_ai_qa_attributes_week_over_week, rpt_wbp_ai_qa_attributes_month_over_month
rpt_wbp_ai_vs_nonai_qa_mom, rpt_wbp_ai_vs_nonai_qa_wow, rpt_wbp_ai_vs_nonai_qa_qoq (short-form already compliant; only grain suffix missing)

Already-compliant examples (for reference): rpt_wbp_email_daily, rpt_wbp_ai_agent_concurrency_daily, rpt_wbp_ai_agents_durations_daily.

Rationale¶

Registry-ready naming: docs/reference/dbt-model-registry.md becomes greppable — "all daily WBP rpts" is one query.
Consumer clarity: Tableau authors reading the view list never have to ask "is this daily or weekly?"
Uniform with the aggregate layer: Section 4 already mandates grain as the last token on agg_ names. This amendment extends the same rule to rpt_.
Additivity guarantee: The performance/productivity semantic split in the name encodes whether rows are channel-additive. Analysts stop double-counting denominators.

6. Model Naming: Client Short Code Required¶

Every model name includes a 3-character client code. This is non-negotiable for a multi-tenant dbt project with shared entity names.

Client codes (registered, enforced):

Client	Code
Warby Parker	`wbp`
Conde Nast	`cnd`
Core/Shared	`core`

New clients get a 3-letter code assigned at onboarding.

Naming by layer:

Layer	Pattern	Example
Silver	`{client}_{object}`	`wbp_contact_timestamps`, `cnd_maestro_answers`
Gold Intermediate	`int_{client}_{description}`	`int_wbp_contacts_hold`
Gold Fact	`fct_{client}_{object}`	`fct_wbp_contacts`, `fct_cnd_qa`
Gold Dimension	`dim_{entity}`	`dim_employee_warbyparker`
Gold Aggregate	`agg_{client}_{subject}_{grain}`	`agg_wbp_qa_daily`, `agg_cnd_qa_daily`
Gold Report	`rpt_{client}_{purpose}_{grain}`	`rpt_wbp_agent_productivity_daily`
Core Report	`rpt_master_{purpose}_{grain}`	`rpt_master_performance_daily`

Grain is always the last token in both aggregate and report names (see Section 5a for the rpt grain suffix rule and grandfather list). Client code is always immediately after the layer prefix.

7. Project Structure: Domain-First (`client/layer`)¶

Restructure models/ from layer/client to client/layer:

models/
├── warbyparker/
│   ├── _warbyparker__sources.yml
│   ├── silver/
│   │   ├── schema.yml
│   │   └── wbp_*.sql
│   ├── gold/
│   │   ├── schema.yml
│   │   └── fct_wbp_*.sql, agg_wbp_*.sql, rpt_wbp_*.sql
│   └── intermediate/
│       └── int_wbp_*.sql
├── condenast/
│   ├── _condenast__sources.yml
│   ├── silver/
│   └── gold/
└── core/
    ├── _core__sources.yml
    ├── silver/
    └── gold/
        ├── dim_employee_*.sql
        └── rpt_master_*.sql        ← Cross-client UNION views

Why domain-first: - Isolation matches the database architecture (each client = separate Snowflake DB, no cross-client joins) - dbt run --select warbyparker runs the full client pipeline - Onboarding a new client = create one folder - Database set once per client in dbt_project.yml, not repeated per layer

dbt_project.yml pattern:

models:
  xo_medallion:
    warbyparker:
      +database: |
        {%- if target.name == "dev" -%} WBP_DB_DEV
        {%- elif target.name == "prod" -%} WBP_DB_PROD
        {%- else -%} invalid_environment
        {%- endif -%}
      silver:
        +materialized: incremental
        +schema: SILVER
        +unique_key: RECORD_KEY
        +incremental_strategy: append
        +on_schema_change: sync_all_columns
        +tags: ["silver", "warbyparker"]
      gold:
        +materialized: table
        +schema: GOLD
        +tags: ["gold", "warbyparker"]
      intermediate:
        +materialized: ephemeral
        +tags: ["intermediate", "warbyparker"]

8. Macros: Shared Patterns¶

Create shared macros in apps/dbt/xo-medallion/macros/ for patterns that repeat across 2+ models:

Macro	Purpose	Priority
`roster_columns(alias)`	Standard roster column SELECT list (EID, name, direct_leader, site)	P1
`weekly_tenure_bucket(date_col, start_date_col)`	Weekly tenure bucket CASE statement	P2
`monthly_tenure_bucket(date_col, start_date_col)`	Monthly tenure bucket CASE statement	P2
`pooled_average(sum_a, count_a, sum_b, count_b)`	QA combined pooled average formula	P3
`ai_enrichment(agent_col, alias)`	JOIN to ai_agents seed + AGENT_TYPE derivation	P4

9. CORE_DB Master Views: Standardized Column Contract¶

Each client's performance aggregate produces the same columns. The master view is a simple UNION ALL.

Standard aggregate column contract (example for agg_{client}_agent_performance_daily):

date                    DATE        -- Required
eid                     VARCHAR     -- Required
agent_name              VARCHAR     -- Required
direct_leader           VARCHAR     -- Required
site                    VARCHAR     -- Required
client                  VARCHAR     -- Required ('WarbyParker', 'CondeNast')
-- Performance metrics (NULL if client lacks this source)
contact_count           NUMBER
handle_time_sec         NUMBER
acw_sec                 NUMBER
hold_time_sec           NUMBER
talk_time_sec           NUMBER
-- Quality metrics
qa_score_sum            FLOAT
qa_count                NUMBER
-- Satisfaction metrics
csat_score_sum          FLOAT
csat_count              NUMBER
-- Attendance metrics
active_time_min         FLOAT
-- SLA metrics
sla_met_count           NUMBER
sla_eligible_count      NUMBER

Master view:

-- core/gold/rpt_master_agent_performance.sql
SELECT * FROM {{ ref('agg_wbp_agent_performance_daily') }}
UNION ALL
SELECT * FROM {{ ref('agg_cnd_agent_performance_daily') }}

Adding a new client = create their aggregate with the standard columns + add one UNION ALL line. If columns don't match, dbt compile fails — which is the intended safeguard.

10. KPI Registry and Client Glossary (Future Work)¶

A KPI registry and client-specific glossary mapping are needed but out of scope for this ADR. The work includes:

KPI Registry: A reference document (docs/reference/kpi-registry.md) listing every standard KPI with its canonical name, formula, unit, and grain.
Client Glossary Mapping: Per-client mapping from their terminology to the standard KPI name (e.g., Warby Parker calls it "Advisor Stars", standard is "CSAT Score").
Master KPI Definitions: SQL snippets or macros that compute each KPI from the standard aggregate columns, ensuring consistency across reports.
KPI Domain Classification: Performance (handle time, contacts, SLA), Quality (QA scores, attributes), Satisfaction (CSAT), Workforce Management (utilization, occupancy, logged-in time).

This will be addressed in a follow-up spike (Phase 6 of the execution plan).

Consequences¶

What gets easier¶

Adding a new client: create one folder, follow the naming convention, produce aggregates with the standard column contract, add UNION ALL to master views.
Debugging: Silver is complete; Gold facts are XO-only by documented construction; the source category comment in each model explains the filtering strategy.
Cross-client reporting: master views in CORE_DB are trivial UNION ALLs.
Code reuse: macros eliminate copy-paste of roster enrichment, tenure bucketing, QA averages.

What gets harder¶

Migration cost: Renaming every model and updating every ref() is a large, breaking change. Must be done in a single coordinated PR per phase.
Roster gap visibility: Case ½ facts emit NULL EID rows when an agent is present in the source but absent from the roster. These rows are excluded from aggregates by WHERE eid IS NOT NULL and surface as dbt warn-severity test failures — an intentional signal for ops/analysts to fix the roster.
Longer model names: fct_wbp_contacts vs fct_contacts. Minor readability cost, major safety gain.

What we give up¶

The ability to have "simple" model names like qa.sql or csat.sql without client prefix. This was never safe in a multi-tenant project.
Direct Silver-to-aggregate shortcuts. All aggregates must go through facts, even when the fact is a thin wrapper. This is the correct trade-off for lineage consistency.

Options Considered¶

Naming: source prefix vs client prefix¶

Source prefix (gladly_contact_timestamps) was rejected because multiple clients can share the same source (e.g., Google Sheets for CSAT). Client prefix is the only reliable disambiguation.

Project structure: layer/client (status quo)¶

Kept layer/client was rejected because it scatters a client's models across two top-level directories, doesn't match the database isolation model, and makes dbt run --select <client> impossible.

Aggregate grains: materialize weekly/monthly/quarterly¶

Rejected. At current scale (~200 agents/client), the daily aggregate has ~73K rows/year. Snowflake computes DATE_TRUNC + GROUP BY over this in milliseconds. Materializing coarser grains adds 3x maintenance for zero performance benefit. Revisit if scale exceeds 10K agents or 3+ years of history per client.

Facts: IS_XO_AGENT flag + LEFT JOIN (original Phase 3 design)¶

Rejected after Phase 3 design review. The flag approach assumes non-XO rows are useful in facts — in practice they are not and create confusion at the aggregate layer. The source-category framework (Section 2) is more explicit and eliminates the need to filter every aggregate.

Facts: filtered to XO only (status quo — INNER JOIN everywhere)¶

The original status quo was rejected because it lacks a systematic framework: some models used redundant WHERE EID IS NOT NULL guards, join types were inconsistent, and the filtering rationale was undocumented. The source-category framework formalizes the intent with consistent patterns per case.

Supersedes relevant sections of: ADR 008 (Gold layer types), ADR 009 (naming conventions for dbt models). Those ADRs remain active for non-dbt naming (S3 paths, DAGs, Bronze tables) but dbt model naming now follows this ADR.