ADR 019 — Gold Layer Standardization: Naming, Structure, and KPI Conventions¶
Status¶
Accepted
Context¶
The Gold layer has grown organically during Warby Parker development. As we onboard Conde Nast and future clients, several inconsistencies will compound into architectural debt:
- Naming collisions: dbt model names must be globally unique.
qa.sqlinsilver/warbyparker/would collide withqa.sqlinsilver/condenast/. Current mitigation (Conde Nast usesmaestro_prefix) is ad-hoc. - Aggregate inconsistency: Models like
agg_agent_daily_contacts,agg_wbp_qa_daily, andagg_qa_condenast_dailyuse different naming patterns — client prefix position varies, grain position varies. - Silver filtering: The Silver CSAT model filters out non-XO agents (
tiffany.buckner,tu.%). This is a business rule in a layer that should only contain data quality rules. - Fact filtering: Gold facts use
INNER JOINto roster, silently dropping all non-XO contacts. This makes ad-hoc analysis of the full dataset impossible without going back to Silver and re-joining. - Aggregate sourcing:
agg_agent_daily_contactsjoins Silver models and roster directly, bypassing facts.agg_agent_performance_dailycorrectly sources from facts. These disagree on the architecture. - Report proliferation: 13 report views for WBP, many being time-period variants of the same pattern (wow/mom/qoq).
- Undocumented
int_prefix:int_contacts_holdexists and is shared by two models, but intermediate models are not part of the documented naming convention. - Project structure:
layer/client(current) vsclient/layer— the latter matches the database isolation model better. - No shared macros: Roster enrichment, tenure bucketing, and QA pooled averages are copy-pasted across models.
- No cross-client KPI standardization: Each client's aggregates have different column names and structures. No mechanism for a unified business overview.
This ADR establishes the conventions that all current and future dbt models must follow.
Decision¶
1. Silver: No Business Filters¶
Silver is the typed, deduplicated, complete mirror of Bronze. The only acceptable filters are data quality filters:
| Allowed in Silver | Not Allowed in Silver |
|---|---|
WHERE primary_key IS NOT NULL |
WHERE admin NOT IN ('tiffany.buckner') |
Deduplication by RECORD_KEY |
Agent/role-based filtering |
Removing structurally malformed rows ('∅' as an ID) |
Date range filtering |
Business rules (XO-agent filtering, status filtering, exclusion lists) belong in Gold.
Migration: Move the CSAT exclusion filter from silver/warbyparker/csat.sql to fct_wbp_csat. The '∅' filter stays in Silver (it is garbage data, not a business rule).
2. Facts: XO-Only by Construction — Source Category Framework¶
Gold fact tables contain XO agent data only. There is no IS_XO_AGENT flag. The mechanism used to enforce XO-only varies by source type, captured by a source_category field in each model's config.meta.
The Three Cases¶
| Case | Source Type | Silver | Fact Join | EID Test |
|---|---|---|---|---|
| 1 | XO-only source | No filter | LEFT JOIN roster |
not_null warn → roster gap alert |
| 2 | Mixed + in-source XO identifier | DQ filter only (∅ removal) |
In-source filter in Gold before LEFT JOIN roster |
not_null warn → roster gap alert |
| 3 | Mixed, no in-source XO identifier | No filter possible | INNER JOIN roster |
Not applicable |
Case 1 — XO-only source (e.g., GSheets QA Internal, Maestro/CND):
The source tool is only used by XO agents. Silver is kept unfiltered. Gold uses LEFT JOIN to the roster so that a NULL EID surfaces as a data quality signal: the agent is in the source system but missing from the roster.
-- Case 1 pattern
left join {{ ref('bi_roster_warbyparker') }} r
ON data.advisors_name = r.okta_user
-- NULL EID = agent in source but missing from roster (roster gap alert)
Case 2 — Mixed source with in-source XO identifier (e.g., GSheets CSAT with tu. prefix):
The source has mixed agents, but XO agents can be identified by a field value prefix or pattern. Silver keeps only the DQ filter (∅ removal). The in-source identifier filter is applied in Gold before the LEFT JOIN:
-- Case 2 pattern — filter first, then join
from {{ ref('wbp_gsheet_csat') }} data
left join {{ ref('bi_roster_warbyparker') }} r
ON data.ADMIN_CORRECTED = r.OKTA_USER
where data.ADMIN_CORRECTED NOT LIKE 'tu.%'
AND data.ADMIN_CORRECTED != 'tiffany.buckner'
-- NULL EID after filter = XO-identified agent missing from roster (roster gap alert)
Case 3 — Mixed source, no in-source XO identifier (e.g., Gladly for WBP, GSheets QA Partner):
The source has mixed agents and no field distinguishes XO from non-XO. The roster join itself is the XO filter: INNER JOIN keeps only agents whose ID appears in the roster.
-- Case 3 pattern
inner join {{ ref('bi_roster_warbyparker') }} r
ON ce.AGENT_ID_FIRST_CLOSED = r.GLADLY_ID
-- Non-XO agents have no roster entry → excluded automatically
-- EID is always populated (INNER JOIN guarantee)
Schema.yml Convention¶
Each fact model records its case in config.meta:
config:
meta:
source_category: "1" # "1", "2", or "3"
xo_filter_strategy: left_join_eid_test # Case 1/2
# or
xo_filter_strategy: inner_join # Case 3
Case 1 and 2 facts also carry a not_null EID test at warn severity:
columns:
- name: EID
tests:
- not_null:
config:
severity: warn # NULL EID = roster gap, not a pipeline failure
Downstream Aggregates¶
Aggregates sourcing from Case 1 or 2 facts must filter WHERE eid IS NOT NULL to exclude roster-gap rows from metrics. Aggregates sourcing from Case 3 facts need no EID filter (INNER JOIN guarantees EID is populated).
Why this replaces IS_XO_AGENT: The IS_XO_AGENT flag approach assumed that keeping non-XO rows in facts was always useful. In practice, non-XO rows are never used in metrics and create confusion at the aggregate layer. The source-category framework is more explicit: each model declares exactly how it achieves XO-only filtering, and the mechanism is self-documenting in both the SQL header comment and the schema.yml metadata.
3. Intermediate Models: int_ Prefix Formalized¶
Intermediate models are shared logic used by 2+ downstream Gold models. They are a recognized dbt pattern.
| Attribute | Rule |
|---|---|
| Prefix | int_ |
| Materialization | ephemeral (compiled as inline subquery, no Snowflake object) |
| Source | Silver models or other intermediates |
| Consumers | Gold facts and aggregates only |
| Naming | int_{client}_{description} |
| When to use | Logic shared by 2+ models. If used by only 1 model, inline as CTE. |
Example: int_wbp_contacts_hold (shared by fct_wbp_contacts and agg_wbp_contacts_daily).
4. Aggregates: Facts-Only Source, Daily Grain, Standard Contract¶
Source: Gold facts only. Never Silver, never Bronze, never roster directly. If an aggregate needs a column that doesn't exist in a fact, add it to the fact.
Grain: Daily is the only materialized aggregate grain. Weekly/monthly/quarterly rollups happen in report views via DATE_TRUNC. At current scale (~200 agents/client), Snowflake computes these rollups in milliseconds over daily aggregates.
Column contract: Each KPI domain (performance, QA, CSAT) has a standard set of columns that every client's aggregate must produce. NULLs for metrics the client doesn't have. This enables trivial UNION ALL in CORE_DB master views.
Naming: agg_{client}_{subject}_{grain} — grain is always the last token.
Store sums and counts, never averages: Downstream consumers compute avg = SUM(x) / SUM(count) at whatever grain they need. This prevents the "average of averages" error.
5. Reports: Views from Aggregates, Period Rollups¶
| Rule | Value |
|---|---|
| Materialization | Always view |
| Source | Gold aggregates only (not facts, not Silver) |
| Purpose | Tableau-ready, presentation layer |
| Period rollups | DATE_TRUNC('week', date) + GROUP BY in the view |
| Period-over-period | LAG() window functions for WoW/MoM/QoQ deltas |
Reports should not proliferate. If multiple reports differ only by time period, consider whether Tableau can handle the period selection via parameters instead of separate views.
5a. Report Grain Suffix and Period-over-Period Naming (Amendment 2026-04-24)¶
This section codifies the naming convention for the rpt_ layer. It is introduced in parallel with the agent KPI v2 architecture (productivity/performance split + account-level rpt views) and resolves the inconsistency that exists today, where some rpt views carry a grain suffix (rpt_wbp_email_daily, rpt_wbp_ai_agent_concurrency_daily) and others do not (rpt_wbp_agent_performance_channel, rpt_wbp_gladly_blended_ai_productivity).
Grain suffix is required¶
All new rpt_ views MUST carry a grain suffix as the last token:
| Suffix | Grain |
|---|---|
_daily |
One row per day at the stated dimension grain |
_weekly |
One row per week |
_monthly |
One row per month |
_quarterly |
One row per quarter |
The suffix describes the time grain of the view. It is orthogonal to — and stacks with — the semantic purpose (_performance, _productivity) and dimension grain tokens (_channel, _account).
Example: rpt_wbp_account_performance_channel_daily decomposes as:
rpt_ (layer) + wbp_ (client) + account_performance_ (purpose) + channel_ (dimension) + daily (time grain).
Period-over-period views use short-form suffixes¶
Period-over-period views (views that expose current-period vs prior-period comparisons as side-by-side columns via LAG()) MUST use the short-form suffix as the last token:
| Suffix | Meaning |
|---|---|
_wow |
Week-over-week |
_mom |
Month-over-month |
_qoq |
Quarter-over-quarter |
_yoy |
Year-over-year |
These replace the long form (_week_over_week, _month_over_month, etc.) that appears in several grandfathered views. Short-form is industry-standard shorthand and reduces name length.
Distinction: period rollup vs period-over-period¶
These are two different things and must not be conflated:
- Period rollup = same metrics, coarser grain. Rows = weeks or months. Suffix:
_weekly,_monthly. - Period-over-period = comparison columns alongside rows at a base grain. Suffix:
_wow,_mom,_qoq,_yoy.
A _weekly view has week_start_date as its date column and SUMs across days. A _wow view has week_start_date plus a prior_week_* column set for each metric and computes a delta.
Semantic categories encode grain-safety¶
Two semantic purposes now have explicit grain rules attached:
| Category | Typical grain | Channel safe to sum? |
|---|---|---|
performance |
date × eid × channel or date × channel |
Yes — channel-additive |
productivity |
date × eid |
No — rate metrics denominated in total active time; never replicate across channel rows |
The semantic category is part of the model name (_performance_, _productivity_). Authors and analysts can read the safety contract from the name alone.
Grandfathered views¶
The following existing views are grandfathered — they continue to operate without rename for Tableau-compatibility reasons. Renaming them is tracked as a separate migration ticket coordinated with Tableau owners, one view per PR:
rpt_wbp_agent_performance_channel(missing_dailysuffix)rpt_wbp_gladly_blended_ai_productivity(missing grain suffix)rpt_wbp_agent_qa_trend(no grain or time suffix)rpt_wbp_xoos_agent_performance(missing grain suffix)rpt_wbp_ai_agent_qa_week_over_week,rpt_wbp_ai_agent_qa_month_over_month(long-form period)rpt_wbp_ai_wave_qa_week_over_week,rpt_wbp_ai_wave_qa_month_over_month,rpt_wbp_ai_wave_qa_quarter_over_quarterrpt_wbp_ai_qa_attributes_week_over_week,rpt_wbp_ai_qa_attributes_month_over_monthrpt_wbp_ai_vs_nonai_qa_mom,rpt_wbp_ai_vs_nonai_qa_wow,rpt_wbp_ai_vs_nonai_qa_qoq(short-form already compliant; only grain suffix missing)
Already-compliant examples (for reference): rpt_wbp_email_daily, rpt_wbp_ai_agent_concurrency_daily, rpt_wbp_ai_agents_durations_daily.
Rationale¶
- Registry-ready naming:
docs/reference/dbt-model-registry.mdbecomes greppable — "all daily WBP rpts" is one query. - Consumer clarity: Tableau authors reading the view list never have to ask "is this daily or weekly?"
- Uniform with the aggregate layer: Section 4 already mandates grain as the last token on
agg_names. This amendment extends the same rule torpt_. - Additivity guarantee: The performance/productivity semantic split in the name encodes whether rows are channel-additive. Analysts stop double-counting denominators.
6. Model Naming: Client Short Code Required¶
Every model name includes a 3-character client code. This is non-negotiable for a multi-tenant dbt project with shared entity names.
Client codes (registered, enforced):
| Client | Code |
|---|---|
| Warby Parker | wbp |
| Conde Nast | cnd |
| Core/Shared | core |
New clients get a 3-letter code assigned at onboarding.
Naming by layer:
| Layer | Pattern | Example |
|---|---|---|
| Silver | {client}_{object} |
wbp_contact_timestamps, cnd_maestro_answers |
| Gold Intermediate | int_{client}_{description} |
int_wbp_contacts_hold |
| Gold Fact | fct_{client}_{object} |
fct_wbp_contacts, fct_cnd_qa |
| Gold Dimension | dim_{entity} |
dim_employee_warbyparker |
| Gold Aggregate | agg_{client}_{subject}_{grain} |
agg_wbp_qa_daily, agg_cnd_qa_daily |
| Gold Report | rpt_{client}_{purpose}_{grain} |
rpt_wbp_agent_productivity_daily |
| Core Report | rpt_master_{purpose}_{grain} |
rpt_master_performance_daily |
Grain is always the last token in both aggregate and report names (see Section 5a for the rpt grain suffix rule and grandfather list). Client code is always immediately after the layer prefix.
7. Project Structure: Domain-First (client/layer)¶
Restructure models/ from layer/client to client/layer:
models/
├── warbyparker/
│ ├── _warbyparker__sources.yml
│ ├── silver/
│ │ ├── schema.yml
│ │ └── wbp_*.sql
│ ├── gold/
│ │ ├── schema.yml
│ │ └── fct_wbp_*.sql, agg_wbp_*.sql, rpt_wbp_*.sql
│ └── intermediate/
│ └── int_wbp_*.sql
├── condenast/
│ ├── _condenast__sources.yml
│ ├── silver/
│ └── gold/
└── core/
├── _core__sources.yml
├── silver/
└── gold/
├── dim_employee_*.sql
└── rpt_master_*.sql ← Cross-client UNION views
Why domain-first:
- Isolation matches the database architecture (each client = separate Snowflake DB, no cross-client joins)
- dbt run --select warbyparker runs the full client pipeline
- Onboarding a new client = create one folder
- Database set once per client in dbt_project.yml, not repeated per layer
dbt_project.yml pattern:
models:
xo_medallion:
warbyparker:
+database: |
{%- if target.name == "dev" -%} WBP_DB_DEV
{%- elif target.name == "prod" -%} WBP_DB_PROD
{%- else -%} invalid_environment
{%- endif -%}
silver:
+materialized: incremental
+schema: SILVER
+unique_key: RECORD_KEY
+incremental_strategy: append
+on_schema_change: sync_all_columns
+tags: ["silver", "warbyparker"]
gold:
+materialized: table
+schema: GOLD
+tags: ["gold", "warbyparker"]
intermediate:
+materialized: ephemeral
+tags: ["intermediate", "warbyparker"]
8. Macros: Shared Patterns¶
Create shared macros in apps/dbt/xo-medallion/macros/ for patterns that repeat across 2+ models:
| Macro | Purpose | Priority |
|---|---|---|
roster_columns(alias) |
Standard roster column SELECT list (EID, name, direct_leader, site) | P1 |
weekly_tenure_bucket(date_col, start_date_col) |
Weekly tenure bucket CASE statement | P2 |
monthly_tenure_bucket(date_col, start_date_col) |
Monthly tenure bucket CASE statement | P2 |
pooled_average(sum_a, count_a, sum_b, count_b) |
QA combined pooled average formula | P3 |
ai_enrichment(agent_col, alias) |
JOIN to ai_agents seed + AGENT_TYPE derivation | P4 |
9. CORE_DB Master Views: Standardized Column Contract¶
Each client's performance aggregate produces the same columns. The master view is a simple UNION ALL.
Standard aggregate column contract (example for agg_{client}_agent_performance_daily):
date DATE -- Required
eid VARCHAR -- Required
agent_name VARCHAR -- Required
direct_leader VARCHAR -- Required
site VARCHAR -- Required
client VARCHAR -- Required ('WarbyParker', 'CondeNast')
-- Performance metrics (NULL if client lacks this source)
contact_count NUMBER
handle_time_sec NUMBER
acw_sec NUMBER
hold_time_sec NUMBER
talk_time_sec NUMBER
-- Quality metrics
qa_score_sum FLOAT
qa_count NUMBER
-- Satisfaction metrics
csat_score_sum FLOAT
csat_count NUMBER
-- Attendance metrics
active_time_min FLOAT
-- SLA metrics
sla_met_count NUMBER
sla_eligible_count NUMBER
Master view:
-- core/gold/rpt_master_agent_performance.sql
SELECT * FROM {{ ref('agg_wbp_agent_performance_daily') }}
UNION ALL
SELECT * FROM {{ ref('agg_cnd_agent_performance_daily') }}
Adding a new client = create their aggregate with the standard columns + add one UNION ALL line. If columns don't match, dbt compile fails — which is the intended safeguard.
10. KPI Registry and Client Glossary (Future Work)¶
A KPI registry and client-specific glossary mapping are needed but out of scope for this ADR. The work includes:
- KPI Registry: A reference document (
docs/reference/kpi-registry.md) listing every standard KPI with its canonical name, formula, unit, and grain. - Client Glossary Mapping: Per-client mapping from their terminology to the standard KPI name (e.g., Warby Parker calls it "Advisor Stars", standard is "CSAT Score").
- Master KPI Definitions: SQL snippets or macros that compute each KPI from the standard aggregate columns, ensuring consistency across reports.
- KPI Domain Classification: Performance (handle time, contacts, SLA), Quality (QA scores, attributes), Satisfaction (CSAT), Workforce Management (utilization, occupancy, logged-in time).
This will be addressed in a follow-up spike (Phase 6 of the execution plan).
Consequences¶
What gets easier¶
- Adding a new client: create one folder, follow the naming convention, produce aggregates with the standard column contract, add UNION ALL to master views.
- Debugging: Silver is complete; Gold facts are XO-only by documented construction; the source category comment in each model explains the filtering strategy.
- Cross-client reporting: master views in CORE_DB are trivial UNION ALLs.
- Code reuse: macros eliminate copy-paste of roster enrichment, tenure bucketing, QA averages.
What gets harder¶
- Migration cost: Renaming every model and updating every
ref()is a large, breaking change. Must be done in a single coordinated PR per phase. - Roster gap visibility: Case ½ facts emit NULL EID rows when an agent is present in the source but absent from the roster. These rows are excluded from aggregates by
WHERE eid IS NOT NULLand surface as dbtwarn-severity test failures — an intentional signal for ops/analysts to fix the roster. - Longer model names:
fct_wbp_contactsvsfct_contacts. Minor readability cost, major safety gain.
What we give up¶
- The ability to have "simple" model names like
qa.sqlorcsat.sqlwithout client prefix. This was never safe in a multi-tenant project. - Direct Silver-to-aggregate shortcuts. All aggregates must go through facts, even when the fact is a thin wrapper. This is the correct trade-off for lineage consistency.
Options Considered¶
Naming: source prefix vs client prefix¶
Source prefix (gladly_contact_timestamps) was rejected because multiple clients can share the same source (e.g., Google Sheets for CSAT). Client prefix is the only reliable disambiguation.
Project structure: layer/client (status quo)¶
Kept layer/client was rejected because it scatters a client's models across two top-level directories, doesn't match the database isolation model, and makes dbt run --select <client> impossible.
Aggregate grains: materialize weekly/monthly/quarterly¶
Rejected. At current scale (~200 agents/client), the daily aggregate has ~73K rows/year. Snowflake computes DATE_TRUNC + GROUP BY over this in milliseconds. Materializing coarser grains adds 3x maintenance for zero performance benefit. Revisit if scale exceeds 10K agents or 3+ years of history per client.
Facts: IS_XO_AGENT flag + LEFT JOIN (original Phase 3 design)¶
Rejected after Phase 3 design review. The flag approach assumes non-XO rows are useful in facts — in practice they are not and create confusion at the aggregate layer. The source-category framework (Section 2) is more explicit and eliminates the need to filter every aggregate.
Facts: filtered to XO only (status quo — INNER JOIN everywhere)¶
The original status quo was rejected because it lacks a systematic framework: some models used redundant WHERE EID IS NOT NULL guards, join types were inconsistent, and the filtering rationale was undocumented. The source-category framework formalizes the intent with consistent patterns per case.
Supersedes relevant sections of: ADR 008 (Gold layer types), ADR 009 (naming conventions for dbt models). Those ADRs remain active for non-dbt naming (S3 paths, DAGs, Bronze tables) but dbt model naming now follows this ADR.